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- Abstract 

This paper reviews *a broad spectrum of methodologies pertinient to , 
studies of schooling effects. Methodological issues and problems are 
addressed according to a three-dimensional conceptual framework consist- 
ing of: (1) indicators of schooling effects, (2) study approaches, and 
(3) units of analysis. Problems and uses of status attainment and dif^ 
ference scores as indicators of schooling effects are discussed first. 
Study approaches to schooling effects are divided into two broad cate- 
goriesi experimental and nonexperimental. Methodological issues re- 
lated to the experimental approaches are discussed in relation to two 
types of designs: experimental group only designs and control group 
designs. Problems related to the nonexperimental approaches are re- 
viewed according to: partitioning of explained variance, comparison of 
regression coefficients, nonlinear regression methods, and causal 
models. Issues and problems related to the units of .analysis are 
presented by contrasting two positions; that data should be analyzed 
at the individual student level, and that data should be analyzed at the 
classrotsn, school, or district level. A third position has emerged:, 
that multilevel' analyses should be performed because schDoling effects 
might result from many sources at many different levels. Finally, some 
methodological trends are identified and their Implications for further 
schooling effects studies are briefly considered. 
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I. INTRODUCTION 

Large-scale studies on student achievci^nt (and secondary analyses 
thereof) concerned with the relative affects of schools, programs, 
and/or teachers have consistently yielded findings that challenge -even 
our most cherished beliefs about the impact of education in America 
(Averch, Carroll, Donaldson, Kiesling, & Pincus, 1972; Circirelli, 
Cooper, & Granger, 1969; Coleman, Campbell, Hobson, McPartland. Mood, 
Weinfeld, & York, 1966; Heath & Nielsen. 1974; Jencks, Smith, Acland, 
Bane, Cohen, Gintis, Heyns, & Michelson, 1972; and Mayeske, Wisler. 
Beaton, Weinfeld, Cohen, Okada, Proshek, & Tabler, 1969). However, 
critics of these studies, such as Cain and Watts (1970), Campbell and 
Eriebacher (1970), Guthrie (1973) , and Hanushek and Kain (1972), 
ask: Are the results of these studies truly reflective of our 
schooling efforts, or are they at least partly artifacts of the research 
methodologies used by behavioral and/or social scientists as they study 

schooling effects? 

It is ittstructiye to examine what several educational researchers 
have had to say in response to such a question. Cain and Watts indicated 
that "the analytical part of the Coleman Report has such methodological 
shortcomings that it offers little policy guidance" (p. 228). In a 
scholarly critique of the Westinghouse/Ohio University study of compensa- 
tory education, Campbell and Eriebacher concluded: "It is tragic that 
the social experiment evaluation most cited by presidents, most influen- 
tial in govemment decision making, should have contained such a mis- 
leading bias" (p. 203). As to the assignment of blame, they responded: 



"In this instance, the fallu came from the inadequacies of the social 
science laetho'dological cotmnunity (including education/psychology, eco- 
nomics. and sociology) which as a population was not ready for this task' 
(p. 204). Herriott and Muse (1973) helieve that "we currently lack both 
the conception and methodological tools essential for an unambiguous 
attribution of educational effects among competing explanatory variables 
(p. 23X) And Cronbach (1976), in a paper ex«mining alternative ways 
of analyzing data, expressed his deeply felt concerns about methodology 
currently in use in educational research: 

1 The mslority of studies of educational effects— whether 
classroom experiments, or evaluation of programs, or 
surveys-have collected and analyzed ^^^a^in ways that 
conceal-more than they reveal. The established methods 
have generated false conclusions in many studies. 

2. The traditional research strategy— pitting substantive^ 
hypotheses against a null hypothesis and requiring stat- 
istical significance of effects— can rarely be used in 
educational research. Samples large enough to detect 
strong but probabilistic effects are likely to be pro- 
hibitively costly, (p. 1) 

Such critiques have stimulated researchers to consider carefully 
the advantages and disadvantages of employing one method over another, 
and have called attention to the need for methodologies that can be 
employed as alternatives to established , practice. The use of 
••conanonaUty analysis" (Mood. 1971) in the Instructional Dimensions 



study (Brady, CUnton, Sweeney, Peteraon, & Poyner. 1977), of 
"path analysis" (Blalock, 1964; Duncan, 1966; Werts & Unn, 1970b; 
Wright, 1921) in the Beginning Teachers Evaluation Study, Phase II 
• (McDonald & Elias. 1976), and of "polynomial regression analyses" 
(Cohen & Cohen, 1975; Fisher, 1925; Kerlinger & Pedhazur, 1973; 
Pearson, 1905) in the Follow Through Classroom Process Measuresient 
and Pupil Growth Study (Soar & soar, 1^72) ahows several of the more recent 
attempts to make use of more appropriate research methodologies in 
schooling effects stiKiies. The Horst, Tallmadge, and Wood (1975) 
paper, developed i.. an attempt to improve the methodology used 
in the evaluation of educational programs. in Title I. has had an im- 
pact well beyond its relatively limited goals, sad is yet another 
example of the potential usefulness of endeavors such as this. 

Purposes 

Thii maior purposes for this review and synthesis effort are: 
(1) to examine issues and problems associated with methodologies per- 
tinent to rescardi on schooling effects, (2) to call attention to 
recent developments in relevant research methodology, and (3) to 
describe ..general methodological trends in the study of schooling 
effects. It is anticipated that the knowledge bases established here 
will facilitate efforts to select and utilize methodologies that will 
- be effective and feasible for -providing feedback to. developers of 
tedmoiogy -iesigned to assist practitioners to identify and eispioit 



oppor timitiefe for improving instruction and its outcomes . 

Methods and Procedures 

An OrRani gafctonal Schema 

The sheer nurtber and richness of mei^ds employed to .tudy school- 
ing effects represented both a blessing turse to the reviewers; . 
there was no lack of methodological areas to explore but seemingly no 
simple way by which discussions of those topIBB- could be organized. 
In the process of seeking ways to develop an organizational framework 
for this review, it proved useful to assume that extant methodologies 
could be located in a muUidlmensional space; that Is. they could be , 
characterized by a limited set of dimensions or facets^^ (Perkins . 1977;/ 
'willems . 1969). Facet design, as viewed by Runkcl an^ McCtath (1972) , 
"is a way of Uying out a domain for research; it yCcifles the, limits 
of the domain and the presumed ordering of its Wparts" (Py 20) . 

•n,e du.cripciye space shown in Figure I l/designed.to represent 
both the domain of interest (1 . e . , rese^ch^thodolpgies pertine^ to 
schooling. effects studies) and the organ^ationaVschema by which the 
review is ordered and delimited, The/ree displayed facets, that is. 
indicators of schooling effects . J^t approaches, and units of analysl 
answer, respectively, th^sique^ons:/ «h,ich analytic 

^iU section II of this r^ deal's with »«' problems related 

■- to the indicators of scl^^ing effects., tbat^^i^ 

issues and problems ^ated to/study approach.- <n research on 
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. . ■ \ 

Figure I. Organizational schema for reviewing research methodolo- 
gies pertinent to the study of schooling effects. 





^schooling effects, or the "how?" dimension, ara discussed in Section 
III. In Section IV are found disciisslons of issues and problems re- 
lated to the \mi of analysis and ^.He analysis of multilevel data in 
studies of schooling effects, that is, the "which analytic wit?" 
dimension. A final section contains a review of methodological trends 
and a brief discussion of the implications that existing methodologies 
have for researchers in the design and conduct of schooling effects 

« 

studies. 

Search Delimitations 

Porter and McDaniels (1974) have convincingly argued that design 
and measurement issues are equally as important to consider in school- 
ing effects studies as they are in educational research in general. 
Despite this a a variety of practical constraints prevents the authors 
from dealing wi%i both design and measuretaent issues in this paper. 
This is not meant to suggest that one area is more important than the 
other, but merely that the authors have chosen to focus on one and 
not the oUier. The present review, therefore, concentrates almost 
exclusively upon issues of design and statistical analysis. 

• In addition, the amount of literature related to research method- 
ologies was so great that it became absolutel^ecessary not only td 
iiroif: this review to the domain of interest (as defined by Figure I), 
but also to be hi^ly selective in its treatment. Some of the issues 
discussed in the later sections are briefly described in the paragraphs 
below and are accompanied by remrks of a delimitit^ nature- 



Indicators of schooling effects , The^ question of what it is that 
is to be measured in studies concerned with schooling effects can be ^ 
dealt with from at 'least two aspects. One might for example ask: 
What are schooling effects? It is the authors*- intent to leave the 
resolution of such political and philosophical problems to others; 
nevertheless, it is necessary to point out that schooling effects 
studies have mainly been concerned with examining the effects of 
schooling on "immediate" student^ outcomes , such as student achieve- 
ment in the basic skills areas. This historical concern is reflected 
in this paper. A second question that can be asked' from a method- 
ological standpoint is : What kinds of measurement:; should be used 
as indicators of schooling effects? This paper ei.amines two major 
indicators of scboollng effecCs:(U status attainmenL or out comes 
(i.e., effects measured at a single point in time), and (2) differences 
(i.e., effects resulting from differences between measurements occurring 
at two points in time or between observed and predictea outcomes). 

c^»H^ .p prna^hes in research on school ing effects. Perhaps the 
most crucial methodological question is; How should schooling effects 
be studied? In responding to this issue we distinguish, as did 
Cronbach (1957), between two major study approaches: the experimental 
and the correlational (or nonexperimental) . Discussed under the 
experimental category (which encompasses pre-experimental , true - 
experimental, and quasi-experimental designs) are "experimental-group- 



only designs" and^Wt;rol-groap 



tvpe approaches to research is discussed tinder the nonexperimental 
category. Also discussed briefly are efforts to coobine'the experi- 
mental and nonexperiiaental approaches. 

The 'descriptive case study approach will be excluded from this 
partlc lar review since ethnographic methodology .suited for the study of 
schooling effects is so different from the main thrust of this paper that 
its inclusion way be distractive more than beneficial. 

Units of analysis . Methodological issues related to the units of 
analysis (and levels of data aggregation) facet are presented by con- 
trasting two types of analytic units; that is, units at the individual 
student (or noncollective) level and unit's at the collective level 
(e.g., classrooms, school's, school districts). Additional discussion 
deals with the analysis of multilevel data. Not discussed are method- 
ologies used for analyzing data from studies involving single students 
or groups (i.e., n = 1, or one-shot case studies). 

Search Strategy ^ ' ' . 

In keeping with the need to econoralze time and resources for this 
investigation, the literature search began with an examination of . ■ 
relevant articles appearing in recent volumes of the Review of , 
Educatlonul Kesearcir and the Review of Re search in Education. This . 
initial step resulted in a list of methodological topics ficom which 
an outline was generated. Key papers and reports related to each topi- 

-eai-area were then identified^--ob4:alned, -reviewed-,- and. .annatated.^ _ - 

Treated as "key literature" were those papers or reports; (1) in which 



methods or ideas listed on the topical outline were originally proposed, 
(2) that wece related to methods used in major and/or controversial 
studies, or (3) that ^uggested new directions and approaches. Addi- 
ti^jnal refefe^kces were examined deifjending upon the nature of their 
citation in the key literature, and upon the recommendations of a panel 
of external reviewers. A final reorganization of the topical outline 
\ms helped *by comments from this same panel of experts. 
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II. ISSUES AKD PROBLEMS RELATED TO 
INDICATORS OF SCHOOLING EFFECTS 

Research can be construed as the process of seeking relationships 
between variables (Gage. 1976a) . This, of course, is also true of 
schooling effects research. All such studies, by definition, are con- 
cerned with the exploration of relationships between independent and 
dependent variables which represent . respectively , the "ef f ectants" and 
the "effects" of schooling. From the meafureiBent point of view, one 
can distinguish between two nvajor types of dependent variables: "status 
attainment or , outcomes" and "differences." Status attainment or outcomes 
indicators are measures of effect taken at a particular t,.>tnent in time. 
Differences indicators or scores representing degrees of discrepancies be- 
tween two measurements on the saine scale can be subdivided into scores 
derived from differences between measures taken at two different points i 
time and scores representing differences between observed and predicted 
outcomes. Issues and problems related to these indicators of schooling 
effects are discussed below. A final section sunmarizes this knowledge 

base and reviews the advantages and disadvantages of both types of 

■»•»..''..... , , , • 

indicators. 

•i* . ^ ■ . . _ 

Status Attainment or Outcongs as 
- ■ Indicators of Schooling Effects 



• AH measurements taken on a student at a single point in time are 
to be regarded as indicators of "statu attainment or outcomes. 



A major class of sdiooling effects studies that employs indicators 
of statiAS attainment or outcomes is educational as'seasteent progcaos. 
Programs or studies such as the lEA studies <e.g.» Conber & Xeeves, 
1973; Purves. 1973; Thorndike, 1973), the National Assessment of 
Educational Progress program (e.g.. NAEP, 1974). and the Educational 
Quality Assessment or EQA program (e.g., Pennsylvania Department of 
Education, 1973) use status attainment data. 

Current efforts to establish minimal competency levels as a pre- , 
requisite to graduation (Madaus & Air as i an, 1977; Pipho, 1977) re- 
present another example of the use of status attainment indicators 
to assess schooling effec|. A more ccjmplex example is the tradition- 
al dependence on class standing, grade point average, and scores on^ 
entrance examination as tHe basis for admission to coll«ge. 

Finally, studies ©f long-term schooling effects on such non- 
cognitive variables as occupational status and Income have also in- 
volved the collection of status attainment data (e.g., B'agerlind, 1975; 
Flanagan & Cooley, 1966; Flanagan. Dailey, Shaycoft, Gorham. Orr, 
& Goldberg, 1964; Jencks et al., 1972). , 

' Difference Scores as Indicators of Scho oling Effects 

All *cores representing degrees^of discrepancies between two 
laeasurements on the same scale are to be regarded as indicators of . 
differences. Two major types of difference indicators, that is, 
change or gain scores and residual scores, are discussed i« this 
'section. ■■ . ' 



Change as Indicators of Schoo ling Effects 



In educational research, schooling effects frequently are eval- 
uated on the basis of the amount of change in those observable student 
behaviors thought attributable to the school, program, teacher, or some 
combination thereof (e.g. .McDonald & Elias, H 76) . Almost all of the 
existing schooling effects studies have been designed to exandne the 
itamedi ate effects of schooling ^d, for purjwses of analysis, have 
utilized adjusted or unadjusted change scores derived from calculating 
differences in pretest to posttest performance (Type A change scores). 
In contrast, the few studies that have researched after-school effects 
(or what Hamqvlst (1977) refers to as the "enduring effects of school- 
ing") have, again for purposes of analysis, utilized change scores 
but as derived from calculating differences In posttest-1 to po8ttest-2 
performance (Type B change scoees). Type A change scores measure 
learning (and/or "growth") : Type B change scores assess retention (and/ 
or "conceptual modification"). 

Discussions below are concerned with issues and problems tradi- 
^cionally related to Type A change scores and will follow quite closely 
in topical coverage Linn and Slinde»s (1977) compreheaiive review 
article entitled, "The Determination, of the Significance of Change 
Between Pre- and Post testing Periods." The two types of change scores 
reviewed here include: firdt, "raw <;hange" (or "difference") scores 
and, second,. "es.timaM4. true '^^ese *iiscuss^ions^^^a^^ 

by a short section dealing with Type B change score issuea. 



1/ 



Raw change scores t The simple. raw change or difference score Is 
*'che most nattiral measure of change from one point In time to another" 
(Linn & Slinde, 19779 p« 122) « Swimmers would be interested In assess- 
ing the difference between pre training speed and speed after sow 
amount of training; golfers » on the other hand, would be concerned with 

- the nusiier of strokes they were able to take off th^lr game as a 
conse(|uence of being coached by the club professional* The raw change 
score obviously is quite easy to calculate, but this simplicity belies 
the methodological complexities associated with Its use* 

There are several major areas of concern related to raw change 
scores* One serious problem with the use of raw change scores is that 
they typically are negatively correlated with the pretest (Bereiter, 
1963; Linn & Slinde, 1977; Thorndike, 1966) • This dependency relation- 
ship is more commonly referred to as the problem of regression toward 

the mean (Guilford, 1954; Herriott & Muse^ 1973; Marcus, Keesling, Rose, 
& Trent, 1972) • An implication of this problem is that students with 
low pretest sipres are more likely to obtain large positive gains ^ while 
students v|th^high pretest scores are less likely to do the' same and 
perhaps show a loss. 

Bereiter (1963) correctly Indicates that "some kind of correction 

is called for'.* (p« 3)« However, he dlso notes that 'att^pts to correct 
for the regression toward the mean eff^eet have led to what he calls the 
"under-'correction/over'-eorreetion dileua8a,*^ ^ He referred to Garslde's 

- (195.6) article in which three methcsis of solving for the regression 



of gains on initial scores were studied. Garside's results were in- 
consistent; that is, with one method the regression estimate increased 
as the correlation between pretest and posttest increased, with another 
it decreased, and the third method was indifferent to this correlation* 
In another instance, Campbell and Erlebacher (1970) succinctly illus- 
trated how biased adjustn^nts could make the gains for one group 
look larger in relation to gains for other groups. Such results, 
they suggest, usually will occur when groups are constituted in such 
a way that the pretest scores for the groups are significantly 
different from one another, as is the case in many quasi-experin^ntal 
studies. 

It can be noted that researchers generally agree that none of the 
offered alternatives made to correct for biases resulting from re- 
gression effects provide a fully satisfactory solution to the problem 
(e.g.V Cronbach & Furby, 1970; Linn & Slinde, 1977). 

Another problem with raw change scores is unreliability (Bereiter 
•1963; Linn & Slinde, 1977; Lord, 1963). Linn and Slinde have illus- 
Lrated vividly that the reliability of raw change scores is a function 
bot^ of the reliability of the pretest and posttest and of th^ir inter- 
correlation. Raw change score reliability increases as the re- 
liability of 'the pretest and posttest increases, but decreases as 
their intercorrelation increases. . , 

Linn and Slinde (1977) indicate that one implication of the un- 
reliability of raw change or "difference" scores is that "it is quite 



risky to make any imjKjrtant decisions about indiyiduais on the basis 
of gains from pre- to posttesting periods" (p. 124). To determine 
the trust oue should have in raw change scores and, thereby, to reduce 
risks. Lord (1963) has recommended the conputation of the correlation 
between observed change and true change, or between estimated true 
change and true change, or both. He indicated that this estimate 
should be calculated prior to analysis proper to be sure that the 
observed change scores are not simply the result of random fluctuations 
or so obscured by random fluctuations as not to be worthy of analysis. 

Bereiter (1963), in an attempt to improve the reliability of the 
raw change score, introduced the "change item" concept and procedure. 
The change item was defined "as an item that is administered to the ^ 
same person on two occasions and scored directly for direction and 
perhaps amount of change" (p. 10), The procedure produces , as a 
by-product, a lowered inter correlation between the pretest arid post- 
test while perhaps even raising the reliability of each. It follows 
that a possible outcome of using this procedure would be' an increase 
in the reliabiiity of the raw change score. 

The very notion of increasing the reliability of the raw change 
score by decreasing the intercorrelation of the pretest and posttest 
raises another issue that Bereiter refers to as the "unreliability- 
invalidity dilemma." The dilemma posits that an increase in the re- 
liability of the raw change score brought about by a decrease in the 
pretest-posttest intercorrelation also tends to lower the validity 
of the measure itself r that is, because of low intercorrelation, the 



same instrument adtninlstered as a pretest and post test «ay be said to 
*be treasuring different things. Despite the above, Bereiter believes 
that the use of the change item practice Is an admissible one for in- 
creasing the reliability of raw change scores. 

Two other issues which are corollaries to the law change score 
problems mentioned above seem worthy of note. First, the correlation 
of a raw change score with another variable that is in part a faction 
of the pretest or posttest. is, because the same errors of measurement 
tare present In both quantities being correlated, usually considered 
spurious (Lord, 1963). When raw change scores are correlated with the 
pretest, a spurious negative correlation usually is obtained. 

Second, unreliability has the effect of attenuating correlations 
(Lord. 1963). The implication of this is that correlations involving 
a rav change score having low reUabllity will tend to be quite low. 
Linn and Slinde 0977) noted that this i.<. rather a discouraging impli- 
cation for educational researcher, interested in finding correlates of 

change. * 

F.^tltnat ed true change scores . An alternative approach to that 

of the raw change, score is to estimate "true" change, that is. the 
change that would obtain if there wert. no error of measurement (Linn 
& Siinde, 1^^77).. AS conceived of by Lore (lVi>fa, Wib, 1963) and by 
McNemar (1958) , true change may be 'estimated by using multiple re- 

..gression_proceduresjased_o« esti^^^ .^^j . 




pretest and posttest. their variances, and their covariance. The Lor d- 
McNeomr argujnent was extended by Crcmbach and Furby (1970) in an atteopt 
"to get a still better estimate" (p. 68).* By distinguishing, as did 
Stanley (1967) , between independent and linked measures (i.e., ones with 
correlated errors) and by suggesting the use of other available measures 
as predictors, Cronbach and Furby substantially advanced methodological 

theory in this area, 

Cronbach and Furby suggested that a more precise estimate of the 
true score could be obtained by adding one or more available measures 
to the least squares estimation. In a study of this issue, Marks and 
Martin (1973) found that the precision of an extended pretest estimator 
of true cha.ige is an increasing function of the correlation between 
truo" change and the true score on the additional measures. More recently, 
iatsuoka (1975) decomposed thn squared raultiple-R of the least-square 
estimate of true-score difference into the reliability of the difference 
score and the increment due to other predictors, which is always non- 
negative. Therefore, adding predictors increases the precision of es- 

'timation. ^ , _ 

The distinction made by Cronbach and Furby between linked and in- 
dependent measures led to the development of different formulas for 
estimating the reliabilities of raw change scores and true change. 
The formulas likewise require that a distinction be made between 
linked and independent pre test-post test correlations. In a study 
of CronbachVand Furby's*reasoning, Marks and Martin (1973) found that, 
as predicted, the magnitude of the correlations between true change 



and pretest tritte a 03 res had a pronounced affect upon the precision 
of true change est iaation. They also noted an analogue to Bereiter'a 
(1963) unreliability-invalidity dilemma in respect to true change 
estimation. It was their suggestion that "as a general rule of thujab, 
the Investigator computing true gain estimates should employ only test 
fcrms with* reliabilities in excess of .85 and especially so if the 
true gain-initial true score correlation is expected or found to 

be .70 or less" (p. 190). 

An estimated true change score has some advantages over a raw 
change score. For one, the reliability of the estimated true change 
score is as large as or larger than the reliability of a raw change 
score (iatsuoka, 1975). In addition/Lord (1963) has empirically 
shown that when estimated true change scores are used In lieu of raw 
change scores, persons with relatively high pretest scores are more 
likely to be among those with large gains. The estimated true change 

scort's, thtircfore, obviate the objection that raw change scores tend 
to favor persons with low pretest scores (i.e. , the regression effect). 

Ihc Kndur in s Effects o£ Schooling 

Only a small number of studies liave been concerned with Type B 

. ^ . ........ • ... . 

change scores and even fewer with "after-schooling" effects in the 
cognitive domain (e.g., Dahllof. 1960; Harnqvist, 1968). larnqvist (1977), 
while rightfully indicating that this is a neglected area in education 
ai research, also cautions the researcher against the use of repeated 
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E-„_,, i. 

measuren^nts : 



1. It Is not easy to retrieve information even If it is there, 
somewhere in the long-term store. In a long-postponed 
measurement o£ retention > nore and different types uf cues 
are likely to be needed, and therefore a repeated measure- 
ment with the same instruswnt • • • directly afker learning 
is not very informative or fair. 

2. Since information is not just stored away until it is re- 
trianred, but undergoes'^ualitative changes in the meantime, 
other things are likely to come out from the store than 
those originally put in, and such dianges are not just 
distortions by a faulty, memory but might very well be im- 
proven^nts also. 

3. For both reasons a quantitative measurement of gains and 
losses over time is likely to be misleading. Only on a 
superficial operational level is there a difference between 
two comparable things, (p. 9) 



Residuals as Indicators of Schooling Effects 

The residual score, obtained by subtracting the predicted cri- 
terion score from the corresponding observed score (DuBois, 1957), 
has been widely used in recent schooling effect studies (e.g., 
McDonald & Ellas, 1976; Soar, 1973). Residuaiiging removes from the 
criterion score the portion that could have been predicted linearly « 
from predictors or covaristes. lite residual score, therefore, has a 
zero correlation with the covar late and consequently does not give an 
advantage to persons with certain values on the covariate measures 
(Linn & Slinde, 1977), 

Residual scores . To avoid confusion, one should distingnish 
between two types of residual scores that differ according to the 



nature of the predictors used in cotaputing predicted criterion scores. 
In one case, predictors are obtained from measures other than 
criterion measures, and in the other case, the saaie measures are 
repeatedly used in detexminlttg ^oth the predictor and criterion measures 
<i.e.. pretest-posttest). The latter type of score is often 
called the "residual gain score." Cronbach and Furby (1970) have 
the opinion that the residual score Is not a corrected tDcasura 
of gain. It is, they 'say. "priaarily a way of singling out individuals 
who changed more (or less) than expected" (p. 24). 

The first type of residual score was used In the schooling 
effects studies of fycr (1970) and Astln and Panos (1966). In contrast. 
Soar (1973) and McDonald and Elias (1976) used the so-called "residual 
gain score" In their investigations of process-product relations. 

With rasldual scores the effects of >u- aria tes have been parciaUed 
out trow the criterion variables, yet tiie residual score still has the 
«amt. unrellabiUty prob^-m as does the raw change. Linn and SUnde 
(1977) showed that residiial score reliability was a function of the . 
reliability of pretesfc and posttest scores and of their intercorre- 
lations. Although the reliablUty coefficients of residual scores- 
nr.. somt^what b.^tter than thoye of the corresponding raw change scores, 
thi^y were stili small whenever the correlation of pretest and posttest 
s.:oreo was large. The same cautions, therefore, that held for the un- 
r.-U.ibiilty jf raw change scores must also apply to residual scores. 
And, ^^ince the problem of unreliability prevails with xhe residual 



scores, researchers are warned to correct for attenuation when computine 
partial correlations as well (Bereiter. 1963; Linn '& Werta. 1973). 

True residual gai.^ scores . It has been noted that raw change or 
gain is to true gain as rfesidual gain is to tru«a residual gain. This 
relationship was used by Tucker. Damarin. and Messick (1966) in their 
attempt to draw attention to the "true residual gain" score which they 
referred to as a "basefree measure of change." They proposed to divide 
the true gain score into two components, one entirely dependant on 
the true score of the first, or baseline test, and one entirely in- 
dependent of it (i.e.. a true predicted gain and ^ true residual gain). 
Tucker et al. (1966) developed equations for estimating both of these 
components* However. Cronbach and Furby (1970) correctly criticized their 
proposals and in the process demonstrated a better way to estimate 
the true residual gairt. 

■n,^ A^vnr.tap, ^« and Disadv antages of Status Attainment 
and Differences as Indicators o f Schooling Effects. 

Argunients for and against the use of status attainment or 
different indicators'in schooling effects studies are ntany and varied. 
And. a. ia often the case . arguments for one are based on arguments 
against the other", ror example, in opposing the use of change vaeasures, 
Gronbach and Furby. (1970) suggested that if one is tasting the null 
bypottesis that two treatments have the same effect, the essential 

'question i« «heU;;r 'posttest averaie scores (i.c. , scatuB-^attaWnt - 

or outoome scores) vary from group to group. Ihey found » occasion 



in which the chaftge score should be estimated in educational research . 
and concluded: It appears that investigators ask questions re- 
garding gain scores would ordinarily be better advised to frame their 
question in other ways" (p. 5?). Linn and Slinde (1977) concurred: 
"The virtues, in doing so [in measuring dutnge) are hard to find. 
Major disadvantages in use of change scores are that they tend to 
conceal conceptual difficulties and they can give xaisleading results" 
(p. 147). Linn and Slinde, following Cronbach and Furt>y» further 
reconmended that ^Vhere appropriate, regression analyses^ that treat 
the pretest no differently from other independent variables (or pre- 
dictors), and the posttest as the dependent, variable, avoid many of the 

difficulties that are introduced by gain scores" (p. 148). 

In contrast, some methodulogists hold tha opinion, that the 

task for the researcher is not to elimnnte the use of differences as 
indicators of effect, but rather to find ways to minimise the problems 
thair use creates. Bcroiter (1963) . for one. has described a nuiuber 
of ways by which problems associated with difference scores can be 
reduced. U is his argt:ment_ that : (D unreliability In pretest scores 
should be corrected before posttest scores are regressed on pre- - 
test scores. (2) the raeaningf ulness of change scores does not depend • 
on n test's measuring "the same thing" on two occasions, and 
(3) measuring changes directly as subjective dimensions which do not 
;,ccc«sarily have underlying pliysical continua is the only way that 
pIrmUrinten>relable""comparisons"b^^ 

dimensions for individuals with different initial standings. ' / 



in attempts to minimize further problems «lth difference ■ 
indicators, researcheys have developed better ways of estimating true 
Change and true residual change (e.g. . Cronffach . Furby. 1970 , 
O'Connor, 1972) and have found that the use of group membership In- 
forltlon treated as a du-y variable In a regression analysis improves 
the fairness of the estimators (Lord. 1963; McNemar. 1975). • 

m the opinion of the authors of this review, it is not yet 
possi'blo to study the schooll^g^f fects th,t concern educators most 
without sometimes resorting to the analysis of differences. Bespite 
their llmltaftvns. there are specifiable conditions when the analysis 
„£ difference scores is more appropriate than the analysis of status 
attainment or outcome scores. The remainder of this section is used 
to describe the conditions of use for the two major categories of 
indicators of effect and to discuss the relative advantages and 
disadvantages of each. 



^^^^^^^^^^^^ Outcome Indicators..oLEf£ecL 

1„ schooling effects studies, measurements taken at a single 
point in time, usually after some intervention has taken place, are 
■ referred to as status attainment or outcome Indicators. 

--■<indltl2!s^^ Status, attainment or ^ outcome measurements 
are a Jp«^ indicators of schooling effects when: (1) initial 
--.student'dif f&ences.. are expected, to^^^l^^^^^^ 
3ater status or outcomes (e.g.. studies of long-term effects and 




mastery programs); (2) initial student differences on crucial 
variables have been controlled (for example* by random assignment 
of students to treatment conditions); and (3) there is no intent to 
attribute schooling effects s^/rf^ihxally to schools, programs, and/or 
teachers' (e.g., state assessment programs). 

Relative advantases . Status attainment or outcome measurements 
generally are easier to collect, store, and process in respect to other 
types of measurements. If initial student differences on crucial 
variables are satisfactorily controlled, then the test of differences 
between treatment conditions on posttest indicators is straightforward 

and easy to interpret. - 

' Relative disadvantages .- In wast educational settings, it is 
difficult to control, for Initial student differences, and as a 
consequence, it is not possible to attribute meaningful schooling 
effects specifically to schools, treatments, and/or teachers. This 
Is an especially se'nsitive issue for those interested in short-term 
'process-product research. Randomiaation, frankly, does not always 
work, and even when it docs, there is no guarantee that selective 
attrition may not fe>ccur later on to bias the results. 

Status attaimaent or outcome measurements are also affected 
adversely by test unreliability. The most effective corrective 
procedure is. to select reliable measures in advance. 



Difference Indicators of Effect 

In schooling effects studies » adjusted or unadjusted scores 
representing differences between measurements taken at two iK)ints 
in time, with the Interval usually being filled by some intervention 
strategy, are referred to as "change** or "gain" indicators. Adjusted 
or unadjusted scores representing differences between predicted and 
observed ^tasiirements are referred to as **residual** indicators* 
Change and residual indicators have been treated in this paper as 
subcategories of "differences" indicators and, in the discussion to 
follow, will be distinguished only when warranted. 

Conditions of use , llie analysis of differences is appropriate 
when the researcher anticipates that initial student differences cannot 
be fully controlled and will thus Influence outcomes in ways that 
wixi prevent a c4ear interpretation of causality. The influence of ■ 
initial student "differences on such short-term effects as reading and 
•mathematics achievement is usually considerable and serves as an exatigrle 
of the appropriate use of differences as indicators of effect, Dif 
ference indicators may also be appropriately used in relatively long- 
term studies. In this regard, St airings and Kaskowitz (1974) report 
that the degree? of the regression effects can drastically be reduced 
in a longitudinal study* - 



Relative advantages. The use of difference scores permiEs 
researchers to statistically control'(as best they tnay) Initial 

student differences. 

Relative disadvantages . Because differences scores are derived 
from two potentially fallible measurements, they are usually unre- 
liable. When differences between repeated meadurements are 'calculated, 
they are likely to be affected by. the phenanenon kiKJwn'as regression 
toward the me&o-. In attempting to' correct for 4;he effects of re- 

# ■ ■ 

gression toward. the mean, the researcher will usually be faced with 
the so-called oVer-correction/under-correction dilenuaa. This problem 

was discussed earlier. 

I»i addition to the above, difference scores are more difficult 

and costly to come by. And, while the r^ gain score is relatively 

easy to calculate, the remaining types of difference scores are much 

more difficult to derive. 

Summary . The autliors' review of l^fie literature indicates that 
although there are considered obiectt^ to the use. of differences 
as Indicators of schooling effects, there are conditions under which 
they are more or le^s appropriate to analyze. It is also the case 
that recent developments in this area have tended to reduce the force 
of earlier objections. 

'in short, certain forms of adjusted difference scores seem to 
be appropriate iT^dicators in the study of schooling effects, es- 
pecially for the study of-^uch . relatively — 
student achievement in reading and mathematics. 



Ill, ISSUES AND PROBLEMS RELAXED TO STUDY APPROACHES 
IN RESEAROi ON SaiOOLING EFFECTS - . 

% . ■ . " . " ■■ . 

' Cronbach (1957), in his presidential address to the American 
Psychological Association, indicated that there were two historic 
streams of method, thought, 'and affiliation in psychology: experimen- 
tal and correlational psychology. These same study approadies have 
been evident in educational research over the years and still re- 
present the predominant methodologies in investigations of schooling 
effects (Alwin, 1976; Gage. 1976a; Herriott & Muse, 1973). This section 
will review Issues and problems related to study approaches in research 
on schooling effectsi specifically, the "experiniental" and "nonex- 
periraental" approaches to the study of schooling effects will be 
discussed below. Also discussed in a subsequent subsection Are two 
relatively new methodological developments in educational research; 
that is, aptitude- treatment interactions and raeta analysis. A final 
subsection is devoted to a summary of review findings in this area. 



Experim_ental Approaches 



Experimental approaches to educational research are characterized 
by attempts to manipulate experimental variables %^hile tightly con- 
trolling relevant situational variables. True experimental designs 
permit researchers to perform rigorous tests of hypotheses and to reject 
those hypotheses that are less tenable. In these designs, the random 
assignment of "experimental units to treatment and control conditions 



is used as a mean of attaining initial grotip equivalency on crucial 
variables. However, experience suggests that it is almost inipossible 
to assign randondy individiiial students to treatisent and control 
conditions in most educational situations. Also, in the natural 
setting of the classroom or school the researcher seldom has full 
control over situational and/or experin^ntal variables. Under. such 
conditions, researchers may use alternative designs, that is, quasi- 
experiniental designs. 

Campbell and Stanley (1963) distinguish between three sets of ex- 
perimental designs ; pre-experimental, true experinjental, and quasi- 
experimental . Identified as pre-experlmental designs are the: (1) one- 
shot case study, (2) one~group pretest-posttest design, and (3) static- 
group compar is ion. True experimental designs include the: (1) pretest- 
posttest control group design, (2) Solomon four-group design, and (3) pos 
test-only control group design. The tun designs classified as quasi- 
t-xperimental include the: (1) time series, (2) equivalent time samples 
denign. (3) equivalent materials samples design, (4) non-equivalent con- 
trol group design, (5). counter-balanced design, (6) separate-sample pre- 
test-posttest design, (7) separate -sample pretest-posttest control group 
design, (8) multiple time series, (9) institutional cycle design, and 
(10) regression discontinuity. 

With two exceptions, the above listed designs can be assigned tp one 
of two major categories distinguished by the number of groups involved 
in the study, namely* oae-group designs (i.e., experitaental 



group only) or aiultiple group aesigns (i.e., control gtoup«). Usted 
in the upper portion of Table I are designs classified according to 
this subdivision of the study approaches diraension and according to 
the indicators of schooling effects dimension (i.e., status attain- 
ment or differences). This latter dimension was depicted earlier 
in Figure I. 

Campbell and Stanley discussed the strengths and weaknesses of 
each of the sixteen designs in terms of internal validity (1.^., in- 
terpretability) and external validity (i.e., generalizability) . The 
eight factors jeopardizing internal validity are: (1) history, 
(2) maturation, (3) testing, (A) instrumentation, (5) statistical 
regression. (6) selection, (7) mortality, and (8) selection-maturation 
interaction. The factors jeopardizing external validity are: 
(I) interaction effects of testing, (2) Interaction effects of 
selection and treatment, (3) reacting effects of experimental 
arrangements, and (4) multiple-treatment interference. Readers are 
advised to refer to Campbell and Scan ley (1963) for a full discussion 
of the strengths and weaknesses of each design. The strengths and 
weaknesses of the experimental group only designs and af control 
group designs are, however, reviewed briefly below. 

Experimental Group Only Designs 

Pre-experimental designs such as the one-shot case study and 
the one-^oup- pretdst-posttest design and quasi-experimental designs 



Table 1 



Research Designs and Analytical Jtodels for Sc|i0oliag Effects Studies 



STUDY APPROACHES 

Control Croup Design 



Expftripeptal 
Approaches* 



i!n)ic\xo:is of schoolinc effects 




}Soasx:periatntsl 
Approached 



• Static-Croup Cocp&rtsoa <t*-tefit) 
.Posttest-Only Control Croup 

Design (t-test, AJJOVA with 

blockltig, ASC0VA) 
.Equivalent Materials Design 
(AKOVA) 



ExperiDentai-Croup-Only 
Design 



Kon-lint^ar 
Regression Msthods_ 



♦Ortc-Shot Cose Study 
(Comoii--knowle4g.<e Cooparisoas) 



I .Polynosjiai regression analysis 



I Partitiening o£ 
Es^lained variaaca 



I .Incronental Partita^oning of 
i ' variance' 

i *Co«Ponality Analysis 



^^^^ 

^n/tuis. 

•Recursive Model 



I 



Caudal ^iodels 



• Pretest-Posttcst Control Group liesign 
(AI^CJVA, Kcpjatad >i^asures AIiOYA) 

.Soioaon Fowr-Group Design 

(2x2 ;a>OVA oi j-bstt^^tfi) 
^KDneiiuivalent Conrroi Grou;^ 3c*is« 

(R^grcsfiion-Discontinuity ilodel, 

Regression Projection Model, General! ;ied 

^tultiple Regr^iSSicn (lodel) 
•Counterbalanced Designs 

(Latin-square ASOVA) 
.Separate-Staple Pretest-^Posttasc 

Control Croup Design (WV;.) 

• hultipla Tiiae Series Design 



' (NonB-rc'i*3rencCki Ccz:p4*ti;-*'r.s) 

.Xi£^e Serii.£ " 
.Equivalent Time-^Sassp Ics 6ea igns 

(seated dfcsi^n /v^OVA) 
.Sfepatace-^Sanple fretest-Pcwittii^t 
Design (t-test ) ^ • 



.Polynomial regression i^nalysi^^ 



.Incrfcc^ental Partitii^ning of Varw.ce 

,Cos-a>;iality Ar*aiysis 

. Parti t toning Residual Criterion y^J^^--"^*' 



•hultiple regreipsion-'~an«*lyiai*-- 
- b n.;d b<rt$ coi^f ficient» 



*?aurt*ies. out- of the i& aesigns in C«*UU and Stanley (1963) arc xlassUi^d into .lU .^^f;;;?^^?f^':^:ff:t 



such as time series, equivalent tiise samples design, and equivalent 

~ ♦ ' * " ' • . ■ . ■ ■ • .... . .... , , . . . 

materials sai^les design are classified here as experimental group 
only designs, simply because they lack a control group. 

The two pre-experin^ntal designs «lassi fled among experimental 
group only designs happen to be the weakest designs anong the sixteen 
listed by Campbell and Stanley on both internal and external validity 
criteria* They lack control over almost all sources of invalidity. 
The one-group pretest-posttest design is, however, free from selection 
and mortality biases. If in a study involving a one-group pretest- 
posttest design a standardized test is administered around the norm- 
ing date (Tallraadge & Horst, 1976), then the norm-group comparisons 
design, which controls for overall sources of internal invalidity, 
can be used to evaluate the school effects. 

Three quasi-experimental designs classified as experimental 
group only designs (that is, time series, equivalent time samples - 
design, and equivalent materials samples designs) have control over 

« 

most sources of internal invalidity. However, these designs still 
lack control over sources of external invalidity. 

Control Grou^ Designs ' 

Ten of the sixteen listed by Campbell and Stanley involve at 
least one control or coil|arison group and may be classified as control 
group designs. A control group in research designs tends to reduce 
(or control) confounding effects from such sources as history. 



maturation, tcatins, instruatentation, and regression. By using such 
coBiplicated quasi-experitaental design* as the separate-aample pretest- 
posttcst design and the separate-aample preteat-postteat control 
group deaign, researchers may Increa*- the external validity of 
findings. 

Of some interest is Ary and Carlson's (1975) flowchart for 
selecting Campbell and Stanley designs, which takes into account 
threats both to internal and external validity. It la a helpful aid 
for the novice researcher in deciding upon the most appropriate 
strategy for a given research effort. • 

Experimental Approaches in Schooling Ef fects Studies 

Inspection reveals that only five of thirteen possible study 
types identified by Stuff lebeaja and Webster (1978) employ experi- 
tnental or quasi-experimental designs as methods for evaluating 
educational programs. The study types are: (1) public relations 
inspired studies, (2) experimental research studies, (3) policy 
studies, (4) decision-oriented studies, and (5) consumer-oriented 
studies. Two other experimental approaches may be added to this 
list: preplanned variation evaluation studies (Light & Smith, 1971) 
and a procedure proposed by Tatsuoka (1972) for evaluating nationwid 

intervention programs. 

From an examination of Stuf f lebeam's (1971) CIPP Evaluation 
" >k)del, one may infer that true-expeVimen 



utility in educatlooal evaluation. Difficulties in o^eting 
assua^tions of constancy of exp^ris^ntal treaement across both 
subjects and time and the inability to assign stt^ents randomly to 
experimental and control groups provide ^ple reasons for this view 
.(SStufflebeao, 1969. 1971). ' 

Tatsuoka (1972), however, has an altogether different viewpoint 
Relative to the constancy requirement in ej^erioental treata^nts, 
Tatsuoka observed: ^ 

An educational program is, by its very nature, an entity 
that is in perpetual flux. This fluid, dynamic entity, 
with all its periodic modifications and refinements is 
the treatment. Nothing in experimental design forbids 
such types of treatment, (p. 3) 

Tatsuoka admitted that under the present educational system 
random assi]|nment of individual students to treata^nt and control 
conditions Is difficult. Nevertheless, since students, in his view, 
are not appropriate units to study in large scale program eval- 
nations » the problem is not a real one. He argued that "classes, 

r 

schools, or even school districts are the proper units, and random 
assignment of these to the conditions is not nearly so infeasible as 
that of students" (p. 2). 

Many researchers engaged In nonexpeiimental classroom process 
studies and teacher effectiveness studies (e.g., Brophy & Evertson, 
1974) admit the need for experioenUl studies to test hypotheses 
generated via correL- ional studies. A general consensus amoni; 



educational researcher^ ia that hypotheses derived from educational 
theories aod instructional models regarding relationships between 
student achievemeat and contextual and instructional process variables 
should be verified via experimental approaches. 

Cronbach (1957) has. noted that a distinctive diaracteris tic of 
modern axperiBentation is the statistical comparison of effects.. 
The early development of techniques in comparative experimentation 
is succinctly documented by Cochran (1976). Analytical n^th^^s are 
described in many reference sources such as Edwards (1972) , Hays 
(1963), Kirk (1968), and Winer (1971), among others. Multivariate 
versions of statistical comparison are described in Anderson (1958), 
Bock (1975), Cooley and Lohnes (1971). Finn (1974), Tatsuoka (1971), 
Timm (1975), and elsewhere. 

Tatsuoka and Tiedeman (1963) developed a schema for presenting 
statistical techniques in relation to educational research based on 
the role (i.e., dependent or independent) , number (i.e., one or 
more than one), and scale- type (i.e. , nominal, ordinal, or interval) 
of variables involved. Among the listed statistical techniques are 
multiple regression, analysis of variance and covariance, and such 
non-parametric statistics as the sign test, median test, Mann-Whitney' 
U test, Kruskal-Waliis one way-ANOVA, Friedman^s two-way ANOVA, Chi- 
square test, Ho telling 'sT^,McNemar* s test for significance of 
changes, and Ck^chran's W test for several related proportions. These 
repre^sent most of the methods ^^^U be used in testing statistical 



hypotheses (usually null hypotheses) in an experimental approach. 

Their schema provides researchers with a reference puint in selecting 

appropriate statistical analysis methods. 

Another practical guide » advanced by Tallmadge and Horst (1975), 

listed five evaluation models named after appropriate analytical 

models: (1) posttest comparison with matched groups, (2) covariance 

analysis, (3) special regression, (4) generalized regtessipn, and 
(5) norm-referenced. A decision tree constructed to aid in the 

selection of the most appropriate model for the conditions of the 
proposed evaluation is provided. 

Tallmadge and Horst discussed the strengths and weaknesses of 
each model and provided an analytical method for testing statistical 
significance of the difference between experimental and control 
group mean scores. They also advanced the notion of educational 
significance, even though it remained a subjective criterion. These 
authors suggest that "if the observed posttest scores exceed the no- 
treatment expectation by one-third of a standard deviation, the treat- 
ment effect be considered educationally significant" (p. 69). . 



Nonexperimental , Correlational Approaches 

According' to Cronbach (1957), correlational approaches to . 
_educational .research are intended for the study of natural relation- 
ships. While experimentejrs are interested only in the variation they 
"themselves'crealer^^^^ 



variation between iadivlduals, social groups, and<«^pecies . It is 
the'torreXatora' missioii to observe and organize data from nature's 
experiments and in' the process to describe the ways by which 
variables' covary. Thus, for example, researchers using statistical 
devices such as" correlational coefficients can study the ways in 
which teacher behaviors are related to student outcomes on reading 
and matheiaatics tests of achievement* Such relationships taay be 
found to be positive, neutral, or negative, and linear or nonlinear. 

Tlie correlator has J^cess to a variety of correlational 
methods, and most ofythese have been described by Tatsuoka and 
Tiedeman (1963)^ A table listing these statistical techniques 
classified according to the role, scale type, and a number of 
variables involved has also been developed by these researchers. 
Listed on their tabid are methods ranging from the contijigency 
coefficient ^Q" to canonical correlation. 

The set of cori'elational techniques described in Table I of 
this section is not intended to cover all of the methods dealt with 
by Tatsuoka add Tiedeman. In fact,* it is limited to regression 
techniques, associated with Pearson's product-montent correlation 
coefficient "r." 

Correlational methods used in studies of educational effects are 

grouped into four categories In Table 1 and include: (1) P^^tition- 

■ .... - ■ _ , 

ing of explained variance. (2) comparison of regression coerf icz^nts , 

(3) nonlinear regr^^^ "TKese""i"cti«r;~ 



catesori«s» which are 4i|^ui»s&d belolir, aXtlpugft mt mit&lly exelusive, 
do differ in the i&ethod of correlational analysis /used (usually 

r « *' ' * ■ * 

regression analysis) and in their emphasis on different statistics 
obtained from the analysis « 

... . . .... ...... 

"* ■ ^ 

Parti'tiafling of Explained Variance ' . 

if" * ' ^ • 

In regression tinodels^ the square of Pearson s product-^fuoment 
correlation "r " is inte^-preted 'as' the proportion of variance in the 

dependent variable that is accounted for by the independent variable* 

* ■ • ' ., ^ * 

' • 2 " 2 

The analogue to r in cashes *of multiple independent variables is R , 

the squared multiple correlation* - When an R is obtained in expert 

imentai research with Jjalanced^deSsigns where predictors ^e in- 

■ ' ■ 2 * 

dependent from each , other , the R is equal to \:he sum of the sijuar^d 

zero7ord0r correlations between each predictor and the criterion ^ 

variable, / Under such conditions, there is jio arabiguity as to the ' ^ 

amount of variance accounted for by a given predictor (Darlington, 

1968)/ ; ^ . 

In oonexperiinentai research, however, the pred^ctors are almost 
always intercor related. The major sources of controversies with 
respect to studies of schooling effects include various attempts to 
partition variance and thereby to' attribute specific portions o( it to 
specific predictors/ . ^ ^ - - - 



int^reinental WirtUlonltm ui variaiuc- . One way o! partllioiilng 
variance is to examine the increment itf the prpportion of variauce 
accouttted for by each predictor as it is entered into a regression 
analysis. This method was used in the Coleman Study (Coleman et al., 
1966) and in a series of lEA studies (e.g.. Comber & Keeves, 1973; 
Purves . 1973; Thorndike, 1973). 

Coleman and his associates regressed student achievement- scp res 
on student background characteristics such as home SES and school 
resoutces. It is the case that when predictors are intercorrelated. 
th^ increment in variance attributed to a given predictor is de- 
teraineJ, in part, by its order of entry in the analysis; in other 
words, the incremental variance is asymetrical. In the Coleman 
study> the student background characteristics were entered into the 
analysis first and Uiis accounted for a large amount of variance, 
leavine the effects of school factors negligible. In rationalizing 
this pioceduro, Coleman and his associates argued that since student 
background characteristics are "prior to school influence, and shape 
the child before he reaches* school, they will be controlled when 
examining the effects of school factors" (p. 198). Pedhaaur (1975), 
however, argues that it is not a sufficient justification to conjirol 
one variable merely because it precedes another predictor. 

Darlington (1968) has discussed the use of various general 
regression procedures, including the increiaental partitioning of 
7-'" variaixce.^ 'indicating they '"are 'valid wh^npreActors" are mtuaily'-^ 



orthogonal but quite dubious otherwise. Creager (1971) has 
proposed the xise of a complete orthogonal factor an^ygis for 
orthogonal decomposition of the regression system that would result- 
in orthogonal components thatuare still interpretable in terms of 
the original variables. 

Commonality analysis . As a solution for the asymmetry problem 
involved in the incremental partitioning of variance, commonality v 
analysis, as developed by Mood (1969, 197irand by Newton and 
Spurrell (1967), partitions the explained variance in the criterion 
variable that may be attributed uniquely to each of the predictors and 
the variance that is to be attributed to various combinations of pre- 
dictors. The unique contribution of a given predictor is the incre- 
ment in the proportion of variance In the dependent variable for 
which it accounts when enter^'d last into the regression analysis . 
The- unique contribution is the same as the squared part correlation 

' u criterion variable with a predictor partialed on all other pre- 
dictors in the regression equations. This method was extensively 
used in the reanalysis of the Coleman study data by ^layeske et al. 
<1969)\ Ip that reanalysls, the variance in the. criterion variable 
was partitioned into the following three major portions; (1) that 
portion uniquely accounted for by student background factors,^ • 
(2) that portion uniquely accounted for by school variables, and 
< 3) that portion accounted for. by the combination Of student back- 
ground and school variables. ■ . " 



Werts (1968) adVbcated the use of coumionaiity analyisis instead 
of the incremental partitioning of variance for studying schooling 
effect?. According to Pedhazur (1975) , it lias a utility viewed 
from a predictive fran» of reference,. In other words, commoaality 
analysis can be used to determine which variable niay be deleted with 
a minimal reduction in the total proportion of variance. In fact, 
Newton and Spurrell (1967) recoaanended comiaoaality analysis spec- 
ifically Tor such a purpose. Despite the above, stepwise regression 
analysis represents a i^ore effective way to reduce the number of 
predictors without affecting greatly the total proportion of variance. 

Viewed from an explanatory frame of reference, commonality analysis 
has very limited value. Pedhazur has suggested that "it might even be 
argued that by its very nature it evades the problem of explanation, 
or, at the very least, fails to come to grips with it" (p. 254). 
Creager (1971), for one, called attention to difficulties in int-er- 
preting the variance accounted for by a combination of predictors, 
lie indicated that two variables may be highly correlated because one 
of them is the cause of the other, or because they both share a common 
cause. Commonality analysis is unable to distinguish between thU two. 
thus, it' is the case that the uniqueness and comn»nality elements are 
affected by the introduction of additional variables^or by the^ deletion 
of variables, when the predictors are inter cor related. 

Another difficulty with commonaitiy analysis is that coaanonality 
"elements "may " have "neg " ' 



coismonallty elments may havf negative aigns vlien suppressor variables 

are involved and that as a consequence the guia of the unique 

contribution^^of the predictors may then exceed 100 percent. The 

former problem is riot solved by arguing,, as Mayeske et al» (1969) 

did, that: '^Negative comaionalities will be regarded as equivalent to 

zero" (p* 49). The solution for the latter problem should wait until 

the former is resolved. v 

It should be noted that a multiple dependent variable version 
. . .. .... 

of the partitioning variance method has been proposed by Lohnes 

■ • * • 

and Cooley (1976). ' . 

Partitioning residual criterion variance . In the incremental 
partitioning of variance and also in computing the uniqueness of a 
predictor in commonality analysis, this effects of all predictors 
that precede it have been partialed out. Some researchers (e.g.. 
As tin & Panes, 1966; Dyer, 197Q) partition the residual criterion 
variance obtained by regressing the criterion or output variable 
(e.g., achievement) on the input variables (e.g., home SES, j>retest 
scores). There is no difference In the prime analysis procedure 
between this method and the two variance partitioning methods 
discussed earlier. The difference is that criterion variables are 
first residuaiized on some predictors or input variables and then 
the resulting residual. variance is used in partitioning. 

In a series of . college input studies, Astin U?70a, 1970b) and 
his associates used an input-output 'TOodel which invoj.ved a two-step 



procedure for calculating a part correlation. In this procedure the 
input variation vas used to residualize the output variable. The 
residualized student output variable was then correlated with the. 
college environment variables. 

In Dyer's (1970) student change model in an educational system, 
the performance indicator of a school system is derived from the 
residual output score of the system which was obtained from a . - 

, ., ■ ^ • ... . 

regression analysis using the input arid "hard to change" variables 
as predictors. After the performance indicators of educational 
systems are obtained, they are studied in relation to the "easy to 
change" surrounding conditions and the. school process variables. 

Anions many problems related to the partitioning residual criterion 
variance, unreliability of change or "gain"yScores, including residual 
scores, is the most serious one. Although Dyer's model uses school ' 
means rather than individual student scores, the reliability of the 
residuals still may be questionable. In Dyers, Linn, and Patton's 
(1969) cross-validation stud^, school residuals showed reasonable 

stability across subsaxnples. Marco*s (1974) study also showed that the 

_ _ ' - . ' ■ ■ . • ■ .. ... ..... . . . . . , . ♦ 

reliabilities of buth individual and school residual scores were i^ela- 
tively stable In cross'-valldatxon, Forsyth (1973) ^ however, reported 
* that school residuals were unstable over time. Thus , it, appears that 
the residuals may be relatively gtable from one sub sample of students to 
another within a single year, but relatively unstable from one year to 

the next. . ... ; * . 

Problems involved in the partitioning of variance when the predicts 
are intercorrelated are also relevant to this approach (Darlington ,/ 1968) 



Oamparlsoo of Ret^ression Coefficients 



The Coleman Study, which used a variance partitioning approach, 
was criticized not only for its validity but also for its usefulness 
as a guide for policy decisions (e.g., Bowles & Levin, 1968; Cain & 
Watts, 1970; Hanushek 6. Kain, 1972). These critics of the study argued 
that the proportions of variance accounted for by a given predictor 
and by certain coafciuations 6f predictors would not, in general, pro- 
vide any guidance for policymakers to decide what course of action 
should be taken to increase student achievement. Consequently, they 
advocated comparing regression coefficients, a method whose purpose 

1 .... ■ , , , .- . .... 

is to assess the effects of each predictor on the criterion 
variable. . These same critics indicated that they preferred regression 
coefficients to percentages of explained variance as estimators of 
school effectiveness. 



Unstnndardlzed and standard! -^ed regression coefficien ts. In the 
following linear, add it ivf model regression- equation 



l^'l 2 2 



P P 



the "b" weights should be treated as partial regression coefficients. 
Onti interprets "b" as indicating the expected change in the criterion 
variable "Y" for a unit change in predictor "X" (with which it is 
afisociated), while holding all other predictors in the equation 
-- ---coustant. Such--an interpretatiou. 0 the .X weights is. said^^^^^^h^^^^ 



to be. valid only In experimental research. Hichelson (1970), for one, 
has indicated that it is incorrect to interpret a regression coefficient 
obtained from nonexpcrimental research as the expected change in the 
criterion variable resulting from a unit change in the predictor, while 
holding all other predictors constant. Mosteller and Moynihan (1972) 
noted that; 



We can estintate^4ie~^ ^f ore ncp-iji.]ai^eve^t between^ ^ ^ 
schools not having and those having a language l^oratory, 
say. But we cannot tell whether actually adding or 
removing a language laboratory would produce nearly^ 
the same differences, (p, 35) 



In the standardizeU expression of the regression equation. 



the standardized rsagression wt^ights, betas, are scale-free indices 
and thus can be compared across different predictors. In spite of 
this /idvantage, some researchers (e.g., Cain & Watts, 1970; Linn. 
Werts. & Tucker, 1971) prefer unstandardized coefficients. The main 
reason for tliis ^aex^ to be that the s arc affected by the var- 
iability of the variables within a specific population being studied, 
while the b'H rcn^ain f^ly stable despite differences in the 
variability of the predictors in different samples (Blalock, 1964). 
There are problems, however, in interpreting unstandardized co- 
efficients in schooling effects studies. For one, the magnitude of 
ir's~dIpends"o7rthe 
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(e.g., cents or dollars), and many of those measures are not interval 
variables (e.g*, attitude)* Smith (1972) reanalyzed the Coleman 
Study data and tounU numerous examples in wnicn comparisons based on 

iVs or b's led to contradictory coi^lusions In respect to the relative 
importance of the same predictors. Smith recommended that 6*8 should 
be us^d when comparing coefficients for several predictors within a 
given sample, while b*«5 should be used when coefficients associated 
with a given predictor are compared across samples. 

Analysis of interaction effects . As a way of studying the In^ 
teraction or joint influence of predictors on a criterion variable, 
rcficaroJiers enter into a regression analysis the product terms for 
the predictors, Anderson (1970) used this technique in his study 
of the effects of classroon nucial climate on Icarninp. He found 
that fair out of fourtc^en subscoros from the Learning Environment 
Inventory showed statistically significant interaction effects with in- 
tcilligence for r.irls. Tn another instnncis Cronbach (1968) reanalyzed 
t^ome of the Wallach and Kogan (1965) data and questioned conclusions 
found in their article. Cronbach used the incremental variance 
partitioning method and reported a totr^l of seven statistically 
significant inf:,reniL»nts added by thi? interaction of intelligence and 
creativity, 

Pedhazur (1975) observed that the value of the concept of in- 
teraction (or the nonadditivity issue) is dubious. He noted that 



Attempts to intcsrpret a regression coeff iciimt for « 
cross-product vector in the conventional matuier create 
aa Illogical situation in that one is led to state that 
the coefficieat indicates the expected change in Y 
associated with a unit change in the cross-product 
^ vector while holding constant all other variables in- 
cluding those from which the cross-product vector was 
generated, (p. 265) * 

It is also to be noted that the c^eff.cients for cross-product 
vectors are affected by. among other things /changes in the means of 
the predictors from which they are generated (Darlington & Rom, 1972). 

Zero-Order Correlations and Nonlinear Regression Analyses 

In correlational studies, it is traditional to investigate the 
linearity of tho regression lines at the zero-order correlation level 
before conducting further analyses. If upon inspection the rer.ression 
line appears to bo nonlinenr. an appropriate transformation is re- 
commended in order thai linear as^uiuptlontJ be met. 
.... m r ■ 

Recently mr, ; . researchers, working in the field of teacher effec-. 
tiveness studies and classroom lnstructioi> .1 variables, have shown 
strong interest in the study of zero-order correlations (e.g., Brophy 

EvcEtson, 1974; Rosenshlne,| 1976, Soar Soar, 1973) . Soar and 
Soar (1976). reported findings that were not only interesting to. con- 
siJor but were also consistent throughout four of their studies. , 
One of these findings was that of a nonlinear relationship, most likely 
of an inverted "U" shape , b^?tween student gain in achievement and a 
ineasurc. of teacher behavior . In general .terms.,, the. inyert^^^^ 



suggests thAt ttCti^ne achievement in mexittixed with reletiveXy 

moderate amounts of certain teacher behaviors and that extremes of 
the behavior, in either direction, tend to lead to reduction in ~ 
student achievement. Another finding, that of the differentiated 
"U", suggests that different kinds of pupil learning varied in 
respect to teacher behaviors associated with greatest pupil gain. 
Brophy and Evertson (1974) and Brophy (1978) have also reported some 
nonlinear distributions. 

The statistical procedure most widely used in detecting the 
nature of nonlinear relationships is the polynomial regression 
analysis in which pwers of variables are introduced in the regression 
analysis. Cronbach (1976) made cautionary remarks against blind 
search for nonlinear relationships: "Nonlinearities may reasonably 
bt» explored, but unless there is a rationale for predicting nonlineari ty 
little credence can be given a nonlinear relationship the first time 

it turns up" (p. 3.11). 

Polynomial regression analysis has problems, too. First of all, 
the nonlinear regression analysis can make a greater contribution in 
an explanatory fraciework. However, ie is difficult Co interpret the 
regression coefficients, even if they are unstandardized ones, in the 
prediction framework. In addition, it is not legitimate to test the 
significance of regression coefficients individually or to test 
intemadiate regression coefficients for the purpose of deleting* those 
that do not reach a prespecified level of significance. 'All a ' 



teseatchet can do la to test coefficients sueeesfiiveXy for the putnose 

of detenoiniag the pattern of the regression (WiUiams, I9S9). 

*■ ■ " . ■ 

. , ■ * ■ 

Catisal Models 

Another approach to the study of schooling effects is causal 
modeling in general and path analysis in particular. The technique 
of path analysis was developed by Wright (1921) more than a half 
century ago, but has not been widely used by educational researchers 
(Tatsuoka, 1973). Blalock <i964) and Duncan (1966) introduced this . 
technique to sociology in the 1960s; Werts and tinn (1970b) were the 
researchers who introduced it in education. 

Path analysis is an analytic tool for theory testing. In order 
to apply it, the researcher has to make explicit the theoretical 
framework within which he/she operates, In fact, the application of 
incremental partitioning of variance implicitly requires researchers 
to formulate a causal* model for specifying relations among variables 
under study. In path dnaiysis, causal sKJdeis should be f.xp licit ly 
expressed, .for eJ&mple, in path diagtams. 

There are.oaay ways to fcxrmulate a causal model, Particular 1>^ 
when the causes are unknown ^kndj.dr unobserved. Wrlght\has noted 
that "in cases in which* the* causal relations are uncertWn, the 

method can be used to find. the Iqgical consequences of any particular 

. . - , 

hypothesis- in regard to*thea%<p.. 557). ThisSuggests- that researchers 
",nead to fonaulate- not -only one buf^many models, and mystt^st 
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each o3fe to deter sine if they /are teitiifcle o*: On# of the min : ■ 

* * • " 

advantages of path analysis is that ir enafiles the researcher to 
* measure the direct and, indirect effects -one variable has upon another. 
In addition., it enables researchers * to decompose the correlation . 
between any two variables into a sum of simple and ctmpound paths. 

Two ^Vpss of causal analysis models, that is, the recursive and 

nonrecursive models, can be distinguished. Issues and methods related 
^ ' , to the analysis of data within the context of^each of these models 

ate examined next* - ^ . .^y.' 

Recursive models . In recursive models . the hypothesizetl causal 
relations among variables are unidirectional; that is, if X is a cause 
of Y, then Y cannuot be a cause of X.. Simon's (195 A) analysis, for 
example, started from the bivariate case, and then moved to a three- 
variai Ic situation in which the basic concern was whether the observed 
correlation between two variables were spurious due to the presence of ■ \ 

a third variable.^ Blalock (1964) has expanded the work of Simon and 

* ■ 

has developed' a Lochnique to test for the existence of linkages -. . 

between variables in recursive models of any size. 

Path coofficiants in recursive models are usually obtained by 
ordinary regression tediniques, which comply with regression assumptions 

, and covariance rtastrictions that . of ten lead to over--identif ication ..ll . \ 

- : -problems HAsher, 1976). Among the asstiniptions that underlie the 

application of the recursive path analysis, in addition to the one- 

way -causal flow -assumptions, are- that : (1) the relations— -— .i .— — 



among the variables in the model atust be liaear, a4tiiti\?e, and 
causal; (2) the residuals cannot be correlated .among themselves, nor 
correlated with the variables in the system; and (3) the variables muat 
be measured on an .interval scale (Kerlinger & Pedhazur, 1973). Unmet 
assw^tions might lead to sizable standa:^ errors of the regression 
coefficients and the path coefficients. The problem of multi- 
collinearity also arises with the use of the path analysis techni<iue. 

Comb and Keeves (1973) and Werts and Linn (1970b) used miniature 
recursive models of educational effects for illustrative purposes. It was 
McDonald and ElLas (1976) who actually used the path analysis technique 
in their educational effects study. Anderson and Evans (1974) used the 
.recursive models to reanalyze data from two studies that appeared in 
the literature. Magidson (1977) applied thi« approach to the Ci^^lli 
cfc 3l. (1969) Head Start data and found small positive estimates of the 
program effects which were originally judged to be totally ineffective. 
IntorosYlnfily, Cicireili et al'. (1969) had earlier stated: "Results 
from the summer program are so negative that it is doubtful that any . 
' change in. design would reverse the findings" (p. 2A5) . 

Konreeurfciive models . In contrast to the recursive causal models, 
nonrecursive jnodeJ.s involve interdependence, feedback, and reciprocal 
causation among at least some of the variables. The controversy „ 
between Jencks et al . (1972) and Smith (1972) regarding the causal 
flow between parents' expectations and student achievement in the 
nuXeiaaa -study v;ould have been settled had a nonrecursive model been 
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One advaatage of nonrecursive soileis is that th% a© aot require 
the assumption that the residuals be uncorrel.atei. While this leads 
to a gain in realism, it brings about problems in the level of 

^ i, 

identification. When an equation is underidentif ied, there is no 
exact solution that gives satisfactory estimates. In those cases, the 
indirect least squares solution is to be used. In this procedure, 
either certain coefficients are assumed to be zero or other exogenous 
variables are Introduced to' the model. On the other hand, in a non- 
recursive just-iientified equation or a nonrecursive overidentif ied 
equation a two-stage least squares solution is usually used. 

The. use' of nonrecursive models requires a high degree of theo- 
retical and methodologick conceptualization. In the field of 
educational research . the studies that used nonrecursive models are 
rare. Using the Colenian study data. Levin (1970) poBtulated a non- 
recursive model and applied the two-stage' least squares raethod in 
ihe estiraatiun prdjcedure. However, his atteftipt was primarily designed 
to serve as an illustration, .\nderson (1978) also provided an em- 
pirical example of non recursive-type analysis using d^ta from the 
Evans and Ander sun (1973) study. 

men all has been said and considered, path analysis has a potent! 
•to serve as- a sirong heuristic tool ■ for the development of theories 
of education. Tatsuoka (1973) has recommended its greater Uses in 
educational research. 



Other; Peveloogt^ts la Sttidy Apf^feachft» 

Two relatively naw methodological dfevfilopc^nts in educational 
research are worthy of note. The first is the attempt to unify the ^ 

experimental and correlational research traditions. The second is 
the effort to test results across sttidies for overall significance. 
Both of these developments are discussed below. ^ 

Aptitude TIreatment Interaction 

As part of his APA presidential address, Cronbach (1957) urged 
that the two major disciplines in psychology, that is, the experimental 
and correlational, be unified. In effect, he was proposing the study 
of "aptitude- treatment interactions." Almost twenty years later, • 
Cronbach (19/5) indicated "that hybrid discipline is now flourishing" _ 
(p. 116). At the same time he admitted that he and others had been 
thwarted ^by inconsistent findings from roughly similar inquiries. 
Me indicated tlint It inight be more fruitful to shift ei^phasis and 
study hxgher .order int^.ractloas as well as first-order ones. Recently. 
Cronbach. and Snov.- (1977) , In a highly regarded book, synthesized 

research in this area. 

Krus and Krus (1978) have observed a reluctance among psychologists 

to unify the discipline and remarked that : 

The present schism between experimentalists and correlationists 
seems' t.o be 'due not to a\different. language, but to different 
L. ^.-^^^ i€LV«J& -af_ l5a^uag4_i«_«s»^«i^-^ Xiafear..fflQdel,.„. The ^ 



on raw scores <e.g. » sua of squares) while <»rreIat±oaists 
sem to prefer heuristic explanations at the standard score 
level (e.g., variance), (p. 120) 

As Cronbach and Snow concluded, nare time is needed to achieve 
a satisfactory level of unification or extention of the two disciplines 
Krus and SCrus observe that in recent years some experimentalists have 
gradually turned to regression xaethods, especially in cases in which 
they^were pursuing interactions and more ccsaplex hypotheses. Based 
on these observations they concluded; 

„ ... . , , , 

... ' b 

• Wlien one considers the gulf separating these two disciplines 

a decade ago, the overall, integrative power and conceptual, ; • 
advantages of, general regression theory seem to indicate ' 
that Cronbach's original vision of the unified discipline 
of scientific psychology is perhaps '4.n the offing, (p. 123) 

.Meta Analysis ' 

Another old but recently revitalized effort to synthesize the 

.. . ^. ^. ^. . .. ^ .^^ .... . .... . . _ . . . .... ..... ;„...,, 

results of independent studies can be found among researchers in the 
fields'of teaching and learning (e.g.. Gage, 1976a; Gia«^s, 1976;^ 
Rosenthal, 1978). The proposed method, referred to as a meta 

• . . ' . 1 ■ ■ ■' . 

analysis, is described by Glass (1976) as "the analysis of analyses" 

(p.. 3). ■ . . 

The need for the meta analysis of research seems to be obvious. 
In educational research, the findings vary in .confusing irregularity 
across contexts, subject matters, and countless other factors. In 
order to 'design a s t-udy systematically on the basis of previous ' * - 



^ ' »-^^v •.^^•,i^>%>vih*u^'-^^'>*.'^ ^Av^L-v^Et^vrvW)^*^.' *«£^^<^i;:^i4£f<ii>ii&^ ;<^^sC;:^:;&£^j^s^ 
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^ ♦ . , 

fiiidings,they first must* be iategrateti in soaie £aahio^. 

The origin of efforts to integrate findings from independent 
studies can be traced to Fi«her (1948) and even to Pearson (1938), 
Since then; there has been a slow but steady increase In the amount 
of literature addressed to the question of how one may obtain an 
overall level of significance for results across studies. There 
have been some relatively receftt attempts to integrate research 
findings by using a^^^mple counting method; that„ is, by counting the 
number of studies reporting favorable or unfavorable outcomes 
(e.g., Bracht, 1970; Dunkin"&"Biddle, 1974; Jamison, Suppes & Wells, 
1974; Light's. Smith, 1971). However, this relatively simplistic 
method has not provided a satisfactory solution to the problem, Mor6 
recently, Rosenthal (1978) has described methods for combining the 
probabilities obtained from two or more independent studies and pro- 
vided a guide for selecting appropriate methods. Examples of the 
application of the rneta analysis technique can be found in Gage 
<1976b). Glass (1976), and Smith and Glass (1977). 

■■ ■ ■ . . ■ . . 1 . . 

^ o ■ - ■ . ■ ' 

^ A Sunmarv of Study Approaches 

iTi line with, existing divisions of research on .schooling effects, 
tht?. study approaches dimension was- divided into two general categories 
(1) the experimental (which includes pre- experimental, true- experimen- 
tal, and quasi-experimental designs) and (2) the nonexperimental 
(which includes a variety of correlational techniques)- 



EatpegjjBtental Aii|yroa<A^ • ' ^ 

For purposes of discussion the experJjnental category was sub- 
divided into two components according to the presence/absence of 
control or coin|>arison grou^. The two sub-divisions were; (1). ex- . 
perimental group only design^, and (2) control or comparison group 
designs. 

Experimental Group Only Designs . Experimental' group only 
designs, which consist mostly of pre-experimental" designs, suffer 
from the lack of internal validity (i.e., interpretability) and ex- 
ternal validity (i.e. , generalizability) . Experimental group only 
designs of the quasi-experimental type, such as the time series', 
equivalent time samples design, and equivalent materials samples 
design, may provide more interpretable results than pre-experimental 

designs, but still lack external >?alidity. . . ^ 

When standardized tests are appropriately admirfistered on a 
pretest-posttest basis, a comparison with a norm group can.be made 
even though there is ,no true control group. r 

Control or Comparison Group Designs . The most serious 'problem 
with control group designs is establishing equivalency between treat- 
ment and control groups on entry measures. Randomization is an 
essential (but not absolutely foolproof) manipulative procedure for 

.. ... ^. . . . . . • . 

establishing initial equivalency for both true and quasi-experimental 
designs. However, undermost existing educational systems, it is 
e^tnemely dif ^^^^^ random 



asalgnaeat of individual students to eicfieriaeofeal atid coatrol groups, 
Jt ±a less difficult to arrange for the random assigniaeat of coliectiyes 
(e.g., classrooffis. schools, and school districts, etc.) to different 

treatment groups. , 

Even when the random asslgninent of collectives occurs, differ- 
ences between treatment and control groups are not always completely 
eliminated. When initial differences are apparent, the use of 
statistical procedures that take initial differences into account ate 
appropriate. Choosing, for example, between the multivariate analysis 
of covariance or the repeated measures design of the multivariate 
analysis of variance is a specific issue related to this problem area. 

Nonexperimental Approaches 

Again, for purposes of dicussian, the nonexperimental or cor- 
relational approaches were subdivided according to, the nature of the 
coefficient calculated from a regression analyses. The four sub- 
divisions were; (1) partitioning of explained variance, (2). comparison 
of regression coefficients, (3) causal models, and (4) nonlinear re- 
gression methods. These subdivisions are. not meant to be exhaustive, 
and, as they are based upon the same regression model, neither are ^ 
they mutually exclusive. \ : . 

Partitioning of explained variance . Two methods for the par- 
titioning of explained variance' (i.e., incremental analysis .and 
commonality analysis) were discussed. In incremental analysis, the 



relative cotttirlbution pf a given prediixtor is determiaed on tiie basis 
of -the aaiount of increased variance accounted for when that prodlctor 
is entered into the regression equation . The results of incremental 
analysis are highly dependent on the order in which variables are 
entered into the equation. Consequently, when an underlying theory 
or hypothesis is controversial, the incremental analysis method cannot 
be employed to resolve the theoretical concerns. 

CommDnaUty analysis does offer a solution to problems associated 
vith confUcting theories. In commonality analysis explained variance 
is partitioned into portions explained by each predictor arid by 
combinations of predictors. Commonality analysis, therefore, provides 
results unaffected by the order by which variables ^re entered into 

the regression eqxiation. . ► . 

r^m parison of regression coefficients . Many researchers regard 
■ the regression coefficient (either standardized or unstandardized) 
as more meaningful for policy-making than explained variance. 
Standardized coefficients are suitable for comparing the relative 
influence of each- predictor within a sample, while unstandardized co- ' 
efficients are useful for comparing the effect *of a predictor 
across samples.. 

' Causal models . Causal modeling , specifically path analysis, ' 
enables one, to measure the .direct and indirect effects chat one 
variahle has upon another. It also enables researchers to decompose 
the. correlation, beMeen. any .m.^^^A^^^... 5. "^.^^1 „ . : 
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composed fwiths. It can ba mod^ therefore, M t«stlsg theoretical . _ 
hypotheses. Two types of causal modela._were coaaidcred; (I), recursive 
models in which the hypothesized causal relations among vari4>les 
are unidirectional, and (2),nonrecursive models which incXut^ 
dependence, feedback, and reciprocal causation among some of the 
variables. Nonrecursive models are more realistic than recursive 
models « 

Nonlinear repression methods . Educational researchers interested 
in studying the relationships between classroom processes and student 
outcomes increasingly have attended to the issue of determining the 
true nature of the functions descriptive of the relationships. In 
particular, polynomial regression has been used in recent studies to 
identify nonlinear relationships. 

Combined °t Ldy Ap proaches 

A number of efforts to combine the experimental and correlational 
study approaches havu been initiated. Cronbach was an early advocate 
for the unification of the experimental and correlational approaches 
to research. The aptitude x treatment interaction studies are what 
Cronbach has advocated . Some researchers see that these efforts have 
just been started. However, it seems to be fair to sav that a eon- 
siderable progress is made in the area of aptitude x t«^ 
action studies. 



New methods for integrating results across studies have been 
developed and even utilized in a few studies. These methods will 
prove valuable in attempts to understand previous findings. 
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*■ IV. ISSUES ASP rKOSLEMS mATES TO THE mit OF 

AHALT8IS ia® THE i«tt.ms OF mTIU^ 

Pftta collected at a given level, say. at Jhe clasarooo level, may 
be aggregated to higher levels (e.g., the school or district" leyel) . or 
perhaps disaggregated to lo%»er levels (i.e.. the individual student 
level) , or.be retaitied and mialyzed at the level at which it was ortgi- 
nally collected. The aiialysis and interpretation of . data that can be 
aggregated (or disaggregated) to different levels constitute an imp or t- 
tan't methodological concem'for the educational researcher who imist 
select'" those "units of analysis" that are the most appropriate given 
• the research question and other constraining factors. This selection 
problem is especially critical in large-scale studies of schooling 
effects, since multilevel data are collected in virtually all such 
inV^t7gationiJ "simply because schools are. in part, aggregates of their 
teachers and pupils, and classrooms are aggregates of the processes and 
persons within them!.' (Bur stein & Smith. 1977. p. .66). 

• Wlien multilevel data have been collected^ the researcher has the 
option either of ^^gregating data to higher levels or of disaggregating 
to lower levels or. more specifically, of using the collected data as 
proxies for lower level data. ^ seldoi^t used third, option is to engdge 
in som.' form of multilevel analysis. Relative to this latter point. ; 
Bursteln and Linn (1977) note: 

The effects of education exist in one form or another 
both between and within the unit at each level of 
educational systems. Yet the raaiority of studies of 

educational effects has -restricted attention to either, . _„ 

overall between-student, between-class .. or between- 

school analyses, (p. 1) _ 



One cosisaquimce of the iib0\% is a lack af osAfettteficy in fiii4iiiss acrasa 

studies.'' 

Sonfetioes, however, the researcher has a severely restricted set of 
options and must choose units of analysis at less than desirable aggrega- 
tion levels. This is especially true when, say, data at the individual 
level may not be obtainable, or if obtainable, not identifiable for each 
individual (and, therefore, not relatable across data collected at dif- 
ferent times or on different forms). Cost factors also may enter into 
decisions to select a particular unit for analysis and not others; it is 
normally cheaper to analyze data at higher levels of aggregation. 

The selection of appropriate levels of analysis is not only an ana^, 
lytic concern; It relates also to the problems of interpretation ^ or more 
specifically of making inferences about relationships found at one level 
to relationships at other levels* This latter problem^ known as the 
"fallacious inference" issue* is best understood by reviewing a study by 
Robinson (1950), who found that the si^c of correlations between illiteracy 
and race was a function of the units of analysis; .95 at the regional 
level, .77 at the state 3«velt and .20 at the individual level. Had it 
been assumed that data aggregated to the regional level would provide the 
same inforroation as data on the individual level, a f allacioii^ inference 
surely would have been raade (Alkcr, 1969). 

Units of analysis issues are discussed in each of the following sec- 
tions. Tht' first deals with units of analysis as a general problem in 
educational rese,^rch, The second section focuses on issues that mlist be 
considered whea selecting appropriate units of analysis. A third section 
contains a discussion pertaining to the analysis of multilevel data in 



general. A brief sUsaary is ps^avided as a final siectioiu 
Units of Analysis Issues itt Education al Research 

The Coleman Report (Coleman et al., 1966), in which the school was 
treated as the unit of analysis, prompted educational researchers coptmitted 
to the. use of the student or classrooms as the unit of analysis to reex- 
amlue'more closely issues and problems related to units of analysis. 
One such reeKaroiaation led Bur stein and Linn (1977) to conclude: "efforts 
to identify the effects of education. . . on pupil performance have suf- 
fered from a lack of attention to the complications caused by the multi- 
level character of cduc.itlonal data" (p. 1). This is somewhat unsettling, 
;ti«ctE unit of analysis and data aggregation problems have long attracted 
the atti?ntion oi other behavioral and social scientists. The study men- 
tioned earlier by Robinson, a .sociologist, is a case in point. In psy- 
chology, Estes (1956) argued . that group learning curves said to show 
gradual learniuR may in reality be a composite of individual curves re- 
flcn-'ilng "yuiivUa" learning-. In economics, it was shown that the procedure 
^ot cvuibiaing pr^^'ferenceM or demand functions at the individual or family 
• Xitvr^l was not useful in forecasting ejcport ^d import demands at the na- ^ 
ti^^^ial level (Scheiioh, 1966). 

the units of .maJysis issue .vas firnt publicly debated In education 
by wtleyV Bloom and C;iaser (Witt rock & Wiley 1970) . It was Wiley's con- 
tention ttxal "if the object of evaluation is a typical classroom instruc- 
lional program where the instruction is received simultaneously by all 
sLudenti in the "c the' appropriato vehic (or 'sampling- unit) is" 

thu class and not the individual student" <p. 26A) . Biodia. and later 
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... , . . . 

Glaser, argued that the unit of analysis should be individual students 
because it is students the school teacher and it is the effects on tlietn 
that should be the focua of evaluation. . . . I 

More recently, Brophy (1977). has argued that in schooling effects 
studies concerned with the nature of classroom intefactions and the re- 
lationshlps between those Interactions and student out coiK measures, the . 
student rather than the class me^n should be the basic unit of analysis. • 
He listed two reasons: (1) most teacher behavior directed to students is 
really directed at individuals rather th^n the whole class, and (2) even 
tee.'.hcvr behavior directed at the whole class interacts with individual 
student * differences to dt^cermine outcomes* 

There is agreement among some researchers that the end result of 
aggregation is a loss'' of information and the possible introduction of 
systematic error (e.g., Burstein^ 197b; Hannan h Buratein, 197A; Haney, 
1975), Haney (1974), using a small set of the Project Follow ThrouKh data, 
r.ct about to demonstratt' that the method by which 'data are aggregated 
n (aad, therefore, how units of analyses aie formed) could affect analytic 
result.,, lie reaggregatwd the Follow Through data into class and school- 
sii:td groupings using three different methods : (1) by random assignment, 
iZ) by prrtc;S4. scores, and (3) by pcsttcst scores. In all three of the^je 
art ific.iai groupings, correlations between pretest and posttest scores . . . . 

incrteasod when data aggregated to the class and school levels were used. 

In conurast, correlations based on the aggregation of non- simulated 

dat a de creased. From these findings, Haney inferred that "when we aggregat5__^._ 
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data CO the classroom levol we confoMad all otlier c&usal variables of uur 
outcome measure with classroom level effects" (p. 30). Honey issues the 
following warnings: 

The -demonstratioa of its existence should make us highly * 
wary of drawing inferences across different levels of an- 
alysis ^ . . j^s^ because variables have a particular re^ 
lationship at the school level, is not sufficient reason 
to infer that the same relations hold at the class or 
individual level. Before we can make inferences across 
levels of analysis with any confidence, we must examine 
• the aggregation relations and the potential manner in 

,;which they may artificially confound relationships between 
variables, (p. 31) 

• ■ ♦ . . 

Grunfeld and Grillches (I960) , hu ^ver, suggest^that aggregation in some 
cas4es may lead a gain. Aiid Hannan (1976) identified two special c'af;cR 
in wiilch aggregaj:io^ conceivably could lead to a gain: (1) aggregation 
chat minimiaes variation in confounding, variables, and (2) aggregation by 

...... * 

tru(^ scores. • ' 

Data from more recent studies of s^choolin^ effects are difficult to, 
inturi?ret because data collected mostly at the individual student level 
have been anaiy2£id at higher levels of aggregation. For example » student 
diitsa havu been aggregated to the classroom level (e^g., Poynor , 1976; Soar, 
19?3; Walberg, 1969) , the school level (c*g*t Coleman, et al*, 1966, 
ilAuuGhek, 1%8), the school district level U.g.» Kiesling, 1970;' Bldwell 
L Kar^arda, J975), the state level (Walberg & Rasher, 1974), and even the 
atlonal level (e.g., Bidwell, 1975; Comber ^ & Keeves, 1973; Thomdike , 1973) 
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Sel e ctlng Units of Analysis 
A crucial decision-point mi the conduct of educational research is 



the selecticm o£ ap|!iropriate tmits of analysis. In making this selection 
'Ht is essential to have a clear picture of the spectrum of possible 
units, so that the choice based on the research problem may be a fruitful 
one" (Galtung^ 1967* p. 37) ♦ In schooling effects studies, the spectruin 
,,of interest ranges from the individual student at the lowest level to 
periiaps the nation at the highest. In between these levels we find col-- 
iectives (i.e*, levels of aggregation) such as: small groups, classrooms 
grades, schools, school districts, states, and regions. This section 

reviews issues that are to bfe considered in the process of selecting 

V ■ 

units of analysis that are the most appropriate given the purpose of tlie 
study and otiier consttainLug f^ictors. Following Haney^s (1974) format^ 
ma^or issues are «^isc*issed under the fot^ following^^headings: (1) pur- 
pose of the study, (2) study design, (3) statistical considerations, and 

(4) practical considerations. 

Study Purpose Considerations 

To a great extent, the specific questions .research seeks to answer 
dictate how a particular study will be conducted. . It follows^ that the 
most banic consideration in the selection of a^unit of analysis should*- 
be the purposes for which the study was undertaken* And while this is . 
essencially tr.ue , /it is also the case that "we cannot base, our selection 
Bulely on the implications of the analysis questions" (Haney, 1976^ p. 
.4?).? The deterainati^^^ units of analysis, he says, is confounded with 
other issues^ such as study^design and data analysis issues, for example. 



Studies differ in respect to the questions they attempt to answer 



and 'sojaetimes' questions may properly be answered only by iinaiyzlii& 
data at several different levels of aggregation. In the context of Pro- 
ject Follow Through, the questions that were to be answered reflected 
the need to analyze data at more than one level of aggregation. On the 
other hand, the questions posed by Brophy and Evertson (1977) could be. 
answered best if the unit of analysis was the individual student. 

Study Design considerations 

Study design, an issue overlapping with the purpose of the study, 
is another consideration when selecting a unit of analysis. According 
to Haney (1974), the three factors which give clues for selecting units 
of analysis are: (1) the units of treatment, (2) the independence of 
treatiaent units, and (3) the appropriate size for units. 

Units of treatment. As a general rule, the unit of analysis should 
be the lowest level of aggregation at which units can receive different 
treatments or different replications of the same treatment (Cronbach, 
1.97t.; GUss & Stanley, 1970; Haney, 197A). At issue is how one de- 
tcrknes the "unit'of treatcjent." The sampling unit may be of some help 
in making such a dettirmiuation (Burstein & Smith, 1977; Cline et al., 
,•197^4; Cronbach, 1976). For example, if randomization was used as a de- 
duce for controlling initial differences, the. units randomly assigned to 
treatment and control groups could be regarded as the unit of analysis 
fHaucy, 1974). Sometimes, however, it is necessary to distinguish the 
sampling unit from the unit of treatment. Such a differentiation was 
ncicac.d,in the Performance Contracting Experiment (Ray, 1972). 



from each was assigned to the treatment condition. In this instance the 
district was the sampling unit, and the school was the unit of treattnent and 

hence, the appropriate unit of analysis^ 

..... ft „ , . ^ 

Independence of treatigent units « Another design consideration is 

statistical independence as addressed by Glass and Stanley (1970). 
According to the principle of independence, there should be no way in 
ti^ich the treatment applied to one unit should overlap or affect ob- 
servations on another unit of treatment. If 5 for example, every student 
in the class correctly answers a question only because he/she earlier 
heard the teacher provide the answer to it when asked by student A, 
these responses are not independent; they 'are, in fact , dependent on the 
question asked by student A. Using simulated data, Glendening (1976) 
demonstrated that failure to meet assumptions of statistical Independence 
of treatment when between-student analysis was performed could cause 
misleading results. In light of her findings, Glendening was forced to 
conclude: 

VJhen dealing with educational data, in alrost all cases, 
the group unit, such as classrooms, should be the unit of 
analysis. If, however, the data do happen to be independent 
' of each other, it is clearly advantageous to use the in- 
dividual unit as the unit of analysis, (p. 46) 

^On the othfcr hand, Cronbach '(1976) argues that **analysis at the level of 

is._likely_ to have no ^justification in science.or policy 
^;tudies unless the collective is in some real sense a carrier of an effect* 

... u ., . . 

. • ■ , - ... « . 

He also, indicates tl^at "in educational research it does seem reasonable 



to think of claaarocRos md schools and districts as having real enbugh 

effects" (p. 1.19a). , " 

It may be asked if it is feasible to choose a unit of analysis on 
the basis of a test for independence. In response, Glendening (197^ re- 

m ■ , 

plied: "As a general rule of thumb a preliminary test of independence 
should not be used to choose a unit of analysis to test for treatment 
differences" (p. 48). The basis for choosing a unit of analysis should 
be the study design itself and careful observations to detertalne if the 

design was adhered to* • 

Appropriate size for units . A final design consideration with re- 
gard to choosing a unit of analysis concerns the "appropriate" size for 
the analysis, unit. Given a liiai ted atiount of experimental material, the 
problem, which may not be solvable under actual research conditions , be- 
comes one of deterriining the "unit size which will most reduce the varian 
of the estimated, difference between two treatments" (Haney,^ 197A, p. 56). 

Statistical Considerations 

" ' ' ■ ..... ... . . .... ^ ... ... 

A variety of statistical issues must be considered in the process 
of selecting a unit of analysis. Not surprisingly, these considerations 
arc related to the questions the analysis is intended to answer and its 
design. Haney (1974) has suggested three kinds of statistical con- 
._siderations: (1) measurement reliability, (2)' degrees of free don, and 
(3) nonequivalence of - treatment groups. Issues and problems pertinent 
to these topics are discussed in this section. 



MaasurettHgiEit^reli^ In S^ctioa II of this paper, ch^ 

of measurement reliability was discussed in respect to measureH of t^ffects. 
If the research principle is accurate, the measures used to assess treatneut 
effects must be reliable. A number of factors. may affect the reliability 
of measures; among them is the level of aggregation ♦ It hap been generally 
known that measurement reliability increases as scores are aggregated to 
higher levels. Hasiey indicates, however, that this is true when the re- 
liability coefficients are computed based on Shaycof t's (1962) model that 
does not account for group characteristics. When an ailtemative model 
proposed by Wiley (1970) was used to estimate the reliability of the same 
data, quite different coefficients were obtained. Haney^s approach to 
the problem is somewhat unique, and is used also in identifying components 
of variance • He suggests that 

if a particular-score has a relatively large component ^ 
_ of variance between classes, then it makes sense* to 

examine it using the class as a unit-*o f -analysis* - 
Conversely, if there is little variance between classes • • • > 
then it is less useful to perform a class level analysis, (p. 69) 

Decrees of l> ee dom . It is well known that as the -degrees of freedom 
increase, the precision of an estimate improves, and equally important 

the **powf?r'V of the relevant statistical test Increases. This understanding 

. . . . , ,» 

is an iiupurtant statistical consideration in selecting a unit of analysis, 

as Lhe degrees of freedom change from one level of analysis to another. 

In this regard Emrick, Sorensbn, and Steams (1973) noted that 

"aggregating pupil leyel data to •the classroom level • . . appears to 

shift evaluation focus from the individual child and. to. reduce statistical. . 



power and precision by daereaslag obsiirvatitHts*V(p. A-5)» Siaitb (1972), 

In contrast, employed the degrees of freedom argument to Justify the use. 

of classroom leyel data instead pf the .individual student data, "Had we 
used the child as ^he unit of analysis we would have been seriously over- 
estimating the number of degrdes of freedom available" (p. 108). 

Haney (1974) believes that since the degrees of freedom issue has 
been used to argue for analysis at the lower and higher levels, this 
"cannot help but raise doubt about its validity" (p. 71). In respect to 
the issue of statistical significance, Haney concludes: 

The only time we ought to be concerned with degrees of 
freedom argument as it relates to statistical signifi- 
cance is when there are so few observations that we 
cannot distinguish statistical significance be tyeen 
effects estimates which are of a magnitude such that 
we would otherwise c^>nsider them educationally signifi- 
cant, (p. 72) 

In respect to significance testing in nonexperimental situations, 
Haney concluded: "The degrees of freedom argument as a guide to the se- 
lection of n unit~of -analysis Is . ... at best a heuristic one" (p. 74). 
U^hen experimental designs are employed in schooling effects studies, the 
degrees of freedom argument should seriously be considered in deciding on 
units of analysis. This is the ease because of known effects of units of 
analysis <m the "power" of statistical Chests. 

Nun-equivalence of treatment groups . Another statistical issue to 
•hi.' considered when deciding on units of analysis involves the method of 
adjusting for initial group differences and, by implication, disattenuati 



of tlm covariate. It may be reaieoibered that Campbell and Erlebacher (1970) 
argued that th<? magnitude of adjustment bias will depend in part on the 

..... ^ . . .. . 

reliability of the. covariate. Since covariate reliabilty is known to be 
affected by data aggregation, the relationship to the unit of analysis 
selection decisions becomes apparent. Cronbach (1976) has indicated 
that "the unit of analysis can make a difference in the estimate of a 
covariate-adjusted treatment mean, when persons or classes have not been 
assigned to treatments at random or when the number of independent 

assignments to treatn^nt ts small'' 1*3). - 

. ............ , . .. . ... . * 

At issue is how the adjustments are to be made. Haney (197A) posits 
that "adjustment of posttest scores for pretest should be made at the 
pupil level prior to any aggregation to higher levels of analysis if the 
full effect of the adjustment is not to be lost" (p. 29). In an analysis 
of Follow-Through data, Abt Associates (Clinc et al., 1974) adjusted for 
the fallibility of the pretest covariate at the individual student level 
only. They argued that classroom and school level data are much rr.ore 
"stable" and not in need of correction. Cronbach (1976), in contrast, 

I , - . . ' ■ ♦ .......... 

holds, the position that **group regressions may be just as fallible as 
Individual ones'* and, given this proposition, argues that "class and 
school analyses of covariance aught to be disattenuated when assignment 
is not, random*' (p. 1.8), 

Practical Considerations 

r. A number of practical considerations must also be given attt?ntion in 
.aeiec-ting a unit . of analysis . Haney (197A) assigned four issues to this 
category : (1) missing data, (2) policy reSjBarch* (3) length of 



investigation, and (4) economy. 

Missing data . In large-scale studies it is likely that some data 
will be missing; that students will have been absent in a particular data 
gathering period. There is, of course, no way that partial data can be 
used at the individual student level. With higher level units, partial 
data can be used. Dyer, Linn, and Patton's (1969) study irelicates, how- 
ever, that missing data may cause serious problems in obtaining discre- 
pancy tneasures, even though data were analyzed at the\school level. 

Po 1 i cy rese ar ch . The purpose of policy research is that of im- 
proving policy rather than testing or improving theory. Given the above, 
Haney advoc;at€S the selection of a uni.t of analysis at a level "at which 

policy manipulable variables can best be, studied" (p. 93). 

\ ■ ■ ' \ 

■ Length of investiRation .. Evaluating ';|n educational program over the 
course of years further complicates iihe unit of analysis issue. '*The pro- 

blem is that life of a classroom as a natural unit in most schools is only 

■ , ' ■ . . \ ' ' ■ / ■ . '■' 

a &imle year" (Haney, 1974, p. 82). Under such conditions, it would be 
difficult to use the classroom as the unit of analysis in a mult iyear an- 
alysis. 

Ec onomy . The final practical" consideration is that of economy. Haney 
{197A) Indicates: 

If a unit-of-analysls larger than the pupil is employed in an 
evaluation study, it is possible that a savings can be made by 

jiaiijjiiai only sob;© of t'..e ouL-Oiiits i«itUi.a tue ut^sirtsa units- - 
of-aaalysis. (pp. 83-84) 



In short t there is no simple way to select appropriate imitsi of analysis # 
Indeed, son^ criteria discussed above may suggest directions that are in 
contradiction to one another* It is essential, theretore, that the researcher 
arrange these considerations in order of priority to optimize the selec-- 
tlon of appropriate units of analysis. 

Ajialyses of Multilevel Data ■ 

In schooling effects studies, it is not unconsnon for researchers to 
have collected data at different levels or to have collected data that 
can be aggregated at different levels. This may represent an- opportunity 
for researchers to analyse the data at multiple levels of aggregation 
(Burstcin & Smith, 1977; Haney, 1974). Three different type^s of multiple 
level analyses can be discerned: (1) parallel analyses across levels of 
aggregation (Haney, 1976; Maw, 1976), (2) contextual analyses (Barton, 

1970; Bowers, 1968), and (3) multilevel analyses {Buratein j& Linn, 1976; 

■ . . ■ f 

Cronbach, 1976; Cronbach & Webb, 1975; Erlebacher, 1977; Kpesling & 
WiU-y. 1974). 

Parallel Analyses \ ; 

Multilevel data can be analyzed for each level of aggregation in 
SMcU ^ way that only variables from the same level of aggregation enter 
into the analysis. When this type of single-level analysis is repeated 
at more than one level of aggregation » it Is referred tp as "parallel 
analysis." In the 1971-72 evaluation of Project Follow! Through, Abt 
Associates employed single-level analyses at the student, class, and 



school lev«l«j that is » t^y-«8^Ioyed a parallel analysis strategy for 



4ata analysis. It is claimed that one advantage of parallel analysis is 
that it allows the researcher to study the consistency of results across 
levels of aggregation. 
Contextual Analyses 

A mixture of variables which represent a unit and those which re- 
present the characteristics of its supra-unit can be used in an analysis, 
called contextual or cotapositional analysis, to study the effects of the 
supra-unit. For e>carople, a mixture of student-level and school-level 
aggregates of student variables can be found in many schooling effects 
studies (e.g.. Bowers. 1968; Coletaan.et al., 1966; Farkas. 197A) , Coleman 
et al. (1966) found that certain contextual variables pertaining to 
characteristics of the student body explained additional variance in 
individual student achievement above and beyond that explained hy the 
same characteristics at the individual level. Coleman and his associates 
argued that the academic climate of the school (i.e.. contextual variables) 
has a direct influence on student performance. 

liauser (1970) opposed this kind of contextual interpretation and 
called it a contextual fallacy: "A not very- distant cousin of the ag- 
j^regativc oV ecological fallacy . . since both involve misinterpre- 
tation of the between group or ecological correlations" (p. 659).- In 
L he same article, he demonstrated a contrived contextual' effect , namely, 
that educational aspiration of students rises as the proportion of males 
— in a high "School student body increases. —He then demolished the claii 
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for a cottt^tttal effect by reinterpreting the global sex ratio vari- 
able as a proxy for such variables as IQ and SES.* The groups with 
high male- to-female ratios also were higher in the proportion of 
Students with IQs and high SES. 

Hauser's point is essentially concerned with "specification error." 
He noted; 

In a purely logical sense this objection can never . 
be Hie t because there are always "other" variables. 
From a practical standpoint, the objection laeaas 
that one should be prepared to argue that his theory 
or relations among individual attributes is complete 
and correct* or at least defensible in relation to 
some explicit criterion, before speculating about 
residual group differences (p. 660). 

iJmith (1972), in a related study, included more background control 
variables in his reanalysis of the Coleman data and found no evidence 
"that characteristics of the student body have a strong independent in- 
fluence on the verbal achievement of individual students" (p, 280). The 
results of Smith's reanalysis support Hauser's viewpoints. 

Haney (1974) seems to be more cautious In dismissing the contextual 
effects. He notes, "Contextual effects may disappear when initial dif- 
t'erences are fully controlled. Nevertheless, in a causal sense it is 
nlroost surely true that contextual effects are sometimes real" (p. 4,4), 
He continue^;: 



The obvious solution to this causal uncertainty is 
more powerful research designs than the non-experi- 
mental cross-sectional sort of design used in 
Project Follow Through or the Coleman study . 
Contextual analysis in non-experimental studies 
must be" viewed with healthy skepticism." "(p. 45^ 



littitil«vei-fei^y»es — ■ • 

The parallel analysis discussed earlier actually consists of two ^ -J 

more single-level analyses (e.g., between-student analysis, betveen-class • | 

analysis) with variables from a single level of aggregation involved. ^ 
In the contextual analysis, variables from two or more different levels • 
of aggregation are entered in a single analysis. Multilevel analyses ar. 
defined her^ as requiring analyses in at least two stages for at Mast ^ ^ * 

two levels of units (Burstein & Linn, 1976). A few recently proposed v^- 

dels are reviewed below. * . 

Between-proup> pooled vi thin-group analysis . Cronbsch fl97fi^ artues 
chat overall between-student analyses are weighted averages of between- 
class and pooled Within-class analyses and are rarely advisable in educa- 
tional contexts. He notes that when heterogeneous wlthin-class slopes 
chat tnay reflect the teacher or treatment effects are present, the es- 
timates based on the poulc^d within-class regression line probably are 
biased. Cronbach suggests the analysis of data at the classroom level 
.<i .c.: between-class analysis) a^^^ 

(i.e.. vithin-class analysis). A pooled within group analysis on the de- 
viation scores from mean is, a feature of Cronbach's model . which dis- 
tinguishe.s it froffi a parallel analysis. 

^ Using this nodel, Cronbach and Webb (1975) reanalyzed Anderson's (1941) , 
study, which reported finding an interaction of "drill vs. meaningful . 

^^thaUs of arithmetic instruction" with student ability and achievement. 
-- -T« a-reHnalysis to separate -bctween=class_and_wi 



the outcome on an aptitude regression, the Aptitude x Treattnent inter- 
action(ATI) findings disappeared. Cronbach and Webb also spplied the model 
t» the Cooperative Re.iding Data (Bond & Dykrtra, 1967) because of many 
reported instances of ATI. Again they found that conventional kinds of 
analyses (i.e., between students analyses) coijbine between-class and 
within-class effects in the sample and that some Aptitude x Treptment 

Using the same m^odology, R^kow, Airasian, and Madaus (1978) re- 
analyzed data from American schools that had participated in the Inter- • 
national Study of Achievement in Hathematics (Husen, 1967). Rakow et al. 
divided the within-school variation into two components, one asso- 

ciated with diiflFcrcncos boLween natlicmatids teachers and the other with 
Individual student differences. They found that "from thirty to forty 
percent of the within-school variation traditionally classified as in- 
dividual studeni: variance „ is associated with between- teacher perfi^rr- 
mance dif ferences"wlthin schools" (p. 19). These findings tend Co sup- 
port the further use of such types of analyses. 

R eKression analysis for hierarchical data . Keesling and Wiley (1974) 
srguad that school-ievftl indicas, such as average daily attendance, do 
-not convey ihdependent information for each student within the school 
and thus should not be included in between- student analyses. At the 
same Lime, thiey indicated that the student-level data should be fitted 
■ ^t thvi level of the student within the school. The Keesling-Wlley an- 
"Tlysis "strategy Incl (IV a "poolea withiri-school regression of - outcomps 



on individual characteristics. (2) aggregation of predicted student outcomes 
over all students wUhln a school, and (3) a between-school regression of 
school mean outcomes on school characteristics >8nd school mean predicted 

outcomes .1 

Applying- this method to the data from the Coleman Study, Keesling 
and Wiley showed that the estimation of the school input .effects could 

be improved. ; , 

.^naxyses of slopes ' and int crc ept b . Burstein and Unn (1977) ob- 
served that "the vattiati^n Sj (specific x.i thin-class slope for class 
j) would become a potent source of information to researchers and policy- 
tnakor.. - especially wlum 'such infonnation is combined with the adjusted 
class eifects" (p. 8) Their au^lytic^l strategy includes the estimation 

■ of specific. wlthin-cla5.|^lopes and Leu;een-claBs regressions of cL.ss tnear.. 
an.i class slopes on teacher chnractt-ristics. / 

V Using simulated .lata, Burstein and Linn studied the analytical con- 
«e<!u.a.ces of hetoroj^cneonH , wlthia-clns. regressions using dif f erent -mo- 
.i;aH, including their <n^. in edusation effects studies. A main conclusion 
■was that neither ^stadvnt-level nnr class-level analysis yielded correct 

■ estimates of teacher/class effects when there were systematic differences 

in within-c1a:^s slopes that vrere determined by teacher quality. 

...Among tho rmltilcveifanalysls models studied were Cronbach's between- 
-class vitUin-cIass analy,;.i9 (CronbacU & Webb. 1975), the KeeslingrWHey . 
•analysds <Kee«ling Wiley, 197^) ,^d a slope-Intercept analysi^ (Burstein 



of the niagnitude of teacher effects on mean class outcome; that ia, all 
models tended to overestimate the direx:t effects of teacher quality when - 
the ^'better** teachers had steep slopes, and tended to underestimate those 
effects when the "better'* teachers had flat slopes. In addition, the 
Keesling-Wiley method showed an indication of bias in estiinating tea-- 
cher. effects on .mean class outcomes, . All these results seem to justify 
Cronbach's (1976) caution about the possibility of developing a universally 
successful strategy. : 

At the conclusion of his review of the unit of analysis issue, Haney 
(1974) made the following recommendations: 

first, investigators ought to have a strong bias for 

studying various properties of the educational system 

at the level at which they occur* . 

* Second, variation' in attributes of interest ouglit to 
be studifed at those levels (or between those units) 
at which it does (or is^ expected to) occur, (p. 9) 

Haney also advised researchers ^to make precise statements of the hypotheses 
to be tested (in terms of mathematical models , . if . possible) , and to begin 
with strictly parallel analyses^ if a researches wants to conduct parallel 
analyses at different levels. Haney further urged researchers to treat 
rJasses and schools as units rather than as aggregates. 

Summarv * 



It may be said that there are two contrasting schools of .thought 
relative to the units, of analysis issue. ^One group of researchers holdn 



the opinion that, in .»schooXl4ig effects studies, the appropriate unit of . 



analysis is the individual student. This position is rationalized be* 
cause actual learning occurs at the individual level. Another group 
argues'that since educational treatments are normally administered at 
the system level, the collective (e.g., classrooms, schools, etc.) is 
the most appcopriate unit of analysis. 

A recently emerged position, held by a third group of researchers, 
suggests that, since student achievement can be influenced by factors 
existing at "different levels of the educational system, data from 
schooling effects studies should be analyzed at multiple levels. The 
•following three Strategies for analyzing multilevel data were reviewed: 
(1) parallel analyses, (2> contextual analyses, and (3) multilevel 
analyses. An examination of these newly proposed techniques for the 
multilevel analyses revcnled 'that they did not provide completely satis 
factory results. Clearly, more research and development in this metho- 
dological area are required. 



V. HETHOmoeim TSEl©S,.A!ffi THEIR IMPUCA^ 
FOR RESEARCH .ON SCHOOLING EtFLCTS 

■* • • 

la the process of examining research aenhodoiogies pertinent to 
studies of' schooling effects, the authors noted that observations made 
by Dershimer and lannaccone (1973), wiiO earlier had examined social and 
political influences on educational research, overlapped with their own 
perceptions of trends, in research methodology. 

A review of that literature points out that few scientific 
researchers, if any, select their problems at randosi. They 
are influenced by several factors, such as the "excitement 
of the chase," current scientific paradigms and theories, 
chance observations the scientists happen to have made, the 
dramatic nav.ure of some phemonena, and the intellectitel 
stimulation derived from work on complex tasks. Researchers 
are influenced by what their colleagues find inportant and 
vital; they respond to society's opinion of their work. They 
are sensitive to the interests of granting agencies or persons, 
and they are influenced by their institutions »^£upport and 
provisions available for certain research tasks- 
(Uershlraer & lannaccone , 1973, p. 113) 



From the authors* point of viev;, the single most important influence 
an trends in educational research methodology was, quite simply, federal 
dollars; it was not, however, the only influence. During the 196Us, events 
occurred that were to influence significantly the shape of Americ i education 
and, to some extent, the methodologies used by educational researchers. 

• It would not now be In error to say, as does ,Hehan (1978), that "the 
iiiost prevalent view in t14s country is that differences in schola.-itic and 
economic success are primarily the result of environmental influence rather 



■ than genetic-endowr.jent" ■ Cp- 33).. ConseqxienX,ly,- lt..must.-be-.dif£icult..:Xor sme. 

of us to comprehend why this view was not also prevalent in the very 



early 1960s: For example. Deutsch<X964) In a review of papers presented 
at a conferenrA in the early 1960s on preschool enric^nt observed: 

• thr overall impact of ties e papers and of their exandnation 
of the literature is ^ negate any concept of fixed intelli- 
gence [emphasis addedfand to foster the belief that the , 
hl= organismas highly mll|*able. particularly during its 
early years, fp. 208) 

It prob^ly waa Hunt^s (1964) book on Intelligence and Experience that ■ 
first gave a measure of credence to this notion and in turn to the early 
■ intervention movement funded initially by private foundations such as ^. ^ 

Ford and. Carnegie. 

The fact that pupils in compensatory education programs made cog- - 
nitive sains in excess of what was expected eventually got the attention 
of congress. In the mid 1960s, Congress passed the ElemenTary and^ '- ' 
secondary Education Act (ESEA) and in so doing brought to life first 
He^tart and later Follow Through: In its wisdom. Congress not only 
demanded that schools should be held accountable for the manner in which 
they spent monies , but also for the impact the school programs had on 
students. Pursuant to its accountability concern. Congress authorized 
a series of nationwide studies of programs funded by the. federal government 
The passing of ESEA legislation and the commissioning of a series of 
large-scale nationwide studies to assess the schooling effects of federally 
supported programs had a direct and irrepressible influence on educational 

- research*. •■ ■ ■ ' /■ „. 

' ■ : T"^ however : one 'other important sociopdlitical event that , in 

retrospect, influenced greatly research methodologies in the study of 



schooling effects. In 1964 the Civil Rights Act was passed and Congress 
cocsnissione-d James Colefnan (Coleman al. , 1966) to document the sus- 
pected race-specific differences in the quality of public education (Shea, 

' ■ ;..*.. 

1976). 

In their attetnpt to respond to Congressional charges to study and 
evaluate the nation's schools and. special programs, behavioral and 
social scientists cane to realize that they lacked the methodological 
tools to carry out appropriately this important social task. This real- 
ization and the need to do something about it gave impetus to the use 
and refinement of methodologies seldom used for educational research and 
to the development, of newer ones. 

The following Hcctions describe methodological trends in research on 
schooling effects, as perceived by the authors, in four topical areas : 
(1) -study approaches, (2) independent variables , (3) indicators of effect 
and (4) analysis of data. The implications of these trends for the con- 
duct of future studies are also considered. 

Trends in Study Approaches 

Rosenshiue and Furst (1973) introduced "a fairly complete paradigm 
for studying .teaching" (p. 122), which corresponds , fairly closely tc the 
study approach dimension as presented in Figure 1 of this document. Thei 
paradigm, which serves as a means of focusing the follov;ing discussion of 

... trends,.^., contains ,.at.._lea^^^ 

4. developsient o£ procedures for describing teaching 
in a quantitative manner: 



7 



2. correlational studies in which the descriptive 
variables are related to measures of student ferowth; 

* 

3. experimental studies In which the significant 
yariablea obtained in the correlational studies - 
are tested in a laore controlled. situatiQn. (p. 122) 

. . ' ' ■ .... ' ' « , s , 

Prior to the 1160s, study appro ichcs to research on schooling effects 

could be characterized as being almost citclusively limited in scope and 
oriented toward the comparison of two or more experimental units; that is, 
schooling effects research was essentially devoted to model building and 
lyoothesis (null hypothesis) testing (Cror^ach» 1975). During this period, 
true and quasi- experimental deslgnij that were essentially univari;ite in 
character were used extensively i-n investigations (Campbell & Stanley, 
1963). In terms of Rosenshine and Furst's (1973) descriptive-correlational 
experimental loop, this period, is^ demarcated by the -"experimental" element. 

The experimental approach to research on schooling effects continues 
and, without doubt, has been employed frequcnLly r.lTicc: the beginnings of 
the 1960s. For example, a federal edict to E3EA ritle I directors niakirE 
them accountable for evaluating their programs actually led to an increase 
in the use of experimental type dvriznr^- However, most of the reports 
submitted were Judged to be of inferior luality and as a result have con- 
tributed little to the schooling cfiecls knowledge base. On. the' other 
hand, the work by Horst, Tailmadge, and Wood (19/5) has improved meth- 
QdDlogy in th.ls area. Of late, all levels of government are attempting . 
to standardize, within relatively narro^'^ limits, the experimental pru- 

- dednres that may be used in evaluating Title I - programs- -(Tailmadge. & . 

ilorst. 1975). . • 



In the 1960s, the convergence is£ high-iipeed electronic UaLa-v-ioci^Fsi 
equipment, advanced multivariate statistical software, and, perhaps, a 
"too rapid increase of federal support for research oh education" (Howe, 
1976, p. 46) led to ^a series of relatively large-scale, nonexperinental, 
multivariate studies. Some of these studies were initiated in response 
to the Congressional request for nationwide studies of federally funded 
educational prograias. Aaong them were a series of studies on Follow 
Through (e.g.. Soar, 1973; Stalllngs & Kaskowitz, 1974). Other studies. 
Initiated in response to the Civil Riglits Act of 1964 included the study 
fay Coleman et al. (1966) and its reanalyses by Jencics*' et al. (1972) and 
by Ilayeske et al. (1969). ' 

In addition, it ia important to note that interest in such studies 
percolated down to state educational agencies," nach as, California. wJUch 
authorized, in conjunction with the I.'ational Institute of Education, 
several relatively large-scale nonexperimental studies as well (e.g., 
"cDonalu Ellas, 1976; Tikunoff, Berliner, S, Rist, 1975). Other studies 
initiated at the state level include those of Srophy and Evertson (1975) 
and Soar and .Soar (1973). 

- Tlie large-scale nonexperimental (or correlational) approach to 
schooling effects has had VTiprecedent^d effect on research methodology. 

The old adage suggesting that "necessity is the mother of invention" could 
never have been more true than during recent years. In attempting to 
answer pressing questions about schooling, nonexperimental study 



approaches have come of he- But .Ince It is the expressed purpo.. .f 
.uch studies to generate hypotheses for sub.e,uent testing under e.- 
percental conditions, one nay ask U large-scale experimental studies 
are far behincH 

What nbout the descriptive element of the Rosenshine and Fu.st 
paradigm? Some interesting development, appear to be in the .aking. 
The Tlkunoff etal. (1975) ethnographic study of a samp ie of teachers 
in the Beginning Teacher Evaluation St^y (BTES) revealed, for 



example: . 

■■■[.■*■ 

that the methodology derived f«»,-<'^i''J°f>;/"^,?^J^^;^°^;^L' ' 

. fr,v future researchrin teaching, particular ly 

iLSylng^l fla^ -where »ore effective teaching .ce.s 

to be occurring, (p. 19) 

Mehan (1978) in a discussion of nonexperimental methodology sugRUBts that 
••because it can address these problems, constitutive ethnography provide, 
a rigorous methodological alternative to large-scale surveys as a .eans 

ol iiUidlng iiducatlonal reform" (p. 

I t would -appear tl,en that all three ot tl« ele»=nts in the Keen- 

sUln^.aml'Furst {1973) paradigm -re actively employed and wlU hec^e 

increasingly iinportaat. 

Trends in Studying I nl e^endetq Variables - 

" One of the earliest large-scale input-output studies of schooling 

: effects (Coieman et al.,;i^ 

"• "variables far' renH>v^d"fromliass^^^^ 

■ ' ' ' \ Ti.r. ^^Anv f-fntiins of the Colcman Report 
age of school building, etc.). The iiiaj or finding or ^ 



was. that taiaily background was a more important "determinant" of -student 
achievement thaa such inputs as the quality of schooling. In effect, per 
popil expenditures and school facilities werl found to have little re- 
lationship to student achievement ( Shea, 1976). In a study using simd- 
lated data modeled after the Coleman study (Haye'ske et al., 1969) and 
Project Talent (Flanagan et al,, 196A; Jencks & Brown, 1975), Lue eke and 
McGinn (1975) indicated that their resultsj ■ . 

suggest that studies which find little or no relationship bctvccn 
educational inputs and achievement may be highly nisle|ding. Our 
.findings suggest that the conjbination of data and statis tical 
technique [emphasis added] most often used i^ unlikely to reveal 
such relationships even when they exist. (?• 34) 



Thty also observed that "researchers who conceive of education mejphanib- 
Llcally, and ,use rt-searcii ucslgns vhicls ignore the actions of individuaLs 
ill sfhools, will find results wUicli conilnn their assunpt ii>ns" (p. 348). 
Lucckt^ aad HcGlnn argmjd for a different category of input-typt? variablc^^ 

in schooling effects siuUics: , f • — 

. ' . ' /■ : ,' , , 

For us, advancement wili come through an. improved understanding of 
. what actually takus place in schools and .classrooms. Studies using 
educational production functions must attiend wore to variables 
pertinent to the educational production p^rocess, and less to exoge- 
uouH factors like faraily background. . this strategy tnay r.ake " 
• it possible to discern the kinds of inputs that, can make schools, 
aore effective institutions. We need to, look wore closely at what 
teachers, principals and superintendent^ do as they assign resources 
" to students, teacliers and schools, and «o .pay more attention to the 
• direct effects of their actions. Pcrhads research will then be more 
useff-"' to those decisionmakers, (p. 343). - ' . 

The Coleroan Heporl and its of f slioots (iencks ct ai. , 1972;-, nayeskc 
ft al. , 1969i Hosteller & I.^ynihan, 1972) also ''ti'.:ded to minimize" tliu 



role oi the teacher in account ing for educaticnnl outcries <Berli«er. 
;^g7^). This finding stliaulated a host of large-s^ale cUssroom process ^, 

studies or process-product research (e.g.. Brophy & Evertson. 197A: 
ncpon.-:ld & Elias, 1976; Soar, 1973; Stalllngs & Kaskowit.. 1^974; TlUmoff 
et al., 1975). 

Brunsuik (19:,6) presented a classification schema in which psycho- 
. logical variables vere classified according to their retnoteness from the 
central processes of the behaving orr,anis:3. This schena is useful in 
understanding trends in selecting independent variables for schooling- 
effects studies. Brunsvik use' the tenns "central," "proximal,' and 
' "distal" to distingui.sh thrc^e broad regions of reference; "central" here 
refers to events within the organism, "pro^inal" refers to events at the* 
\ interface between the onanism and thn cnviicnnent. and "distal" saggests 
events with which the organ isu, is not iu dii.ct contact, over which the 
organiH!.! does nut exercise imediate control (Snow, 1968). 
^ U..infl this schena, trends in the selection of independent variables 

■ ' for l..rge-scale studies appear to be noving froin distal (e.g. , Coletnan 
et al.. 1966) to essentially proximal-central variables (e.g., Brophy 
& B^/ertson, 1974; Soar, 1973; Tilumoff, BerUner, & lUst, 1975). The 
Stalliugs and KasUowitz (1973). and the HcDunald and Ellas '(1973) .tudJe. 
exan;ined variables v , all three regions (i.e. , distal. "proxltnal, and antral 

variables), , 

" From the perspective of the authors, it would appear .uhat schooling 
^ ^' proximal-central-- variables, but not 

■ mH:...sarib' at the expense of distal cnes. withim the central region. 



there is some indication of a shift toward a more detailed examination of 
the student behavior (McDonald & Elias, 1976; Tikunoff et al., 1975). 
In this latter respect*, (LthnogVapliic tcclinlques such as those usvd in tho 
Tikunoff at al. (1975) study itsay prove quite useful. 

Trends In lndic?^tors of Effects 

Since the 1960s, an increased use of Sil types of indicators of 
schooling effects is evident. Status attainment or outcoiiie data were 
collected and analyzed for the Coleman Report (Coleman et al. , 1966), 
for the National Ass essiaent of Educational Progress (NAEP, 1974) , and in 
a host of statewide assessment prograr,is (e.g.. Pennsylvania Departncnt of 
Education, 1973). 1110 continued use of status attainnent data is pxpected 
and its use should even increase as schools begin to establish winiinum 
competency leVL>lP as the basis for granting certain diplomas. 

Host large-scale short-term schooling effects studies eop loved sor.e 
form of difference stores for analysis. For exiinple, 'unadjusted chnnge- 
or "gain". scores were used in the McDonald and Elias (1976) study, and 
residual scores weirc used in studies by Soar (1973) and Stallings and 
Kaskowits (1974). . 

r4ucatioual practitioners Interested in detennining the relation- 
ships between educational improvement efforts and short-term student 
achievement will find the residual score to be of use where initial 
^:tudent differences cannot be controlled. \ 



' ' * Trends in Data Analysis 

Wiuh the advent of nujdem electronic data processing systems and 
the d|pelopnient of increasingly sophisticated statistical software 
packages, there has been a clear tendency to employ multivariate analy- 
ses (Cooley, 1965; Tatsuoka, 1973). At the same time, with the reali- 
zation that the relationship between certain classroom process variable 
and outcome variables may be nonlinear, there has been an increase in 
the examination of both linear and nonlinear bivariate relations or 
regressions (e.g., Brophy & Evertson, 1974; Soar, 1973). 

Another importattt tvcnd in schooling effects studies is the in- 
creasing tendency to analyze data at the individual student level. Ter 
haps, more important is the trend to employ multilevel analyses (e.g., 

- .... ■ ^ ... . , . . ^ ^ , . . 

liurstein, 1976; Cronbach & Webb, 1975). 

The .search for dif f erentialcd cifects or Interactions across diff- 
erent students, teacners, schools, and or programs is on the upsv;ing 
(e.g., Bropiiy, 1977; Cronbach & SnoK, 1977; :Soar & Soar, 19?5). -jHow- 
ever, in spite of no re than a decade of research, there still are no , 
consistent findings resulting fron aptitude- treatment interaction 
.studlei4 (Cronbach, 1973). This seeas to imply that further research, 
is needed in the areas of higher-order interactions and/or differen- 
tirited nonlinear relationships. Another implication is that researchet 
need to conceptualize schemas by which the findings across studies 
tan T)e s^^ (e. g. V nedieyv ll^??) and areas that require- f urther 

investigation idenlifiud. 

Another important new trend is thsLt .of synthesizing the findings 



; that 



across studies using »eta analysis techniques ^^^^ Gla:3s, 
1976) so as to arrive at overall index of , for example, program 
effectiveness. >Ieta analysis and tht^ conceptual schema mentioned 
above represent extremely important methodological developments 

i 

for researchers in their attempts to build comprehensive knowledse bases 
and construct new theories. 



Summary 



Prior to the 1960s, educational research on schooling effects could 
be characterized generally as limited in scope, devoted to model building 
and hypothesis testing (Cronbach, 1975). rarely including formal observa- 
tions of the behavior of teachers when they taught or of .pupils when 
they learned (Medley & Jlitsel. 1963), univariate in approach (Kerlinger 
~& Pedhazur. 1973), and dedicated to the quest for nomothetic theoty 
(Croabach, 1975). In short, it was an era during which the predonlnant 
methodological approach to the study of schooling effects was the small- 
scale nonprocess-oriented, essentially univariate experiment concerned 
with the discovery of uaiversal'ly applicable laws. 

The 1960s represented a turning point in research on schooling 
effects. Spurrejd on particularly by the Coleman Report (Coleman et al.", 
1966) and by Congressional authorization to study. Head Start and Follow 
Through on a nationwide basis, educational researchers reexamined closely 



their research methodologies. Since the late 1960s, the research on 
schooling effects receiving the most attention has been large-jscale, 
nmltiregional (i.e.. distal-proximal-central), multivariate, and non- 
experimental in focus. During this period, the unit of analysis has 
shifted from the school district to the classroom and individual student 
level, and, more importantly, to multilevel units. 

> ihis statement of trends should not be taken to imply that method- 
ologies used to study schooling effects prior to the 1960s are no longer 
being employed; indeed, almost without exception, they exist side by side 
with current methodological innovations. On the whole, the authors were 
hard pressed to find examples in which established research methodologies 
were totally discarded in lieu of "innovative" procedures. Nor were 
many "new" methodologies discerned. However, methooologies have changed; 
they.have become more refined. Shulman's (1970) observations are relevant 

The present era is one of significant methodological progress in ^ 
^^e behavioral sciences and education. The development of new 
techniques, especially in the multivariate domain, proceeds at a 
rate which dazzles the non-specialist, even though in the eyes of 
the educational statistician, v^st of the "new developments" are 
merely variations on a few major themes, (p. 390) 

Whether these methodological trends are regarded as a methodological 
advancement or as mere variations of existing methods depends upon one's 
point of view. Methodological trends, regardless of whether they are 
methodological advancements or refin^ents, seem to provide educational 
- researchers with better perspectives on ^ucational- development. _ _ 
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